Saved Queries

In the evaluation of vehicle noise, vibration and harshness (NVH) performance, interior noise control is the core consideration. In the early stage of automobile research and development, accurate prediction of interior noise caused by road surface is very important for optimizing NVH performance and shortening the development cycle. Although the data-driven machine learning method has been widely used in automobile noise research due to its advantages of no need for accurate physical modeling, data learning and generalization ability, it still faces the challenge of insufficient accuracy in capturing key local features, such as peaks, in practical NVH engineering. Aiming at this challenge, this paper introduces a forecast approach that utilizes an empirical-informed neural network, which aims to integrate a physical mechanism and a data-driven method. By deeply analyzing the transmission path of interior noise, this method embeds the acoustic mechanism features such as local peak and noise correlation into the deep neural network as physical constraints; therefore, this approach significantly enhances the model’s predictive performance. Experimental findings indicate that, in contrast to conventional deep learning techniques, this method is able to develop better generalization capabilities with limited samples, while still maintaining prediction accuracy. In the verification of specific models, this method shows obvious advantages in prediction accuracy and computational efficiency, which verifies its application value in practical engineering. The main contributions of this study are the proposal of an empirical-informed neural network that embeds vibro-acoustic mechanisms into the loss function and the introduction of an adaptive weight strategy to enhance model robustness. Full article

(This article belongs to the Section Vehicle Engineering)

►▼ Show Figures

Figure 1

22 pages, 26488 KB

Open AccessArticle

Lightweight Deep Learning Approaches on Edge Devices for Fetal Movement Monitoring

by Atcharawan Rattanasak, Talit Jumphoo, Kasidit Kokkhunthod, Wongsathon Pathonsuwan, Rattikan Nualsri, Sittinon Thanonklang, Pattama Tongdee, Porntip Nimkuntod, Monthippa Uthansakul and Peerapong Uthansakul

Biosensors 2025, 15(10), 662; https://doi.org/10.3390/bios15100662 - 2 Oct 2025

Abstract

Fetal movement monitoring (FMM) is crucial for assessing fetal well-being, traditionally relying on clinical assessments or maternal perception, each with inherent limitations. This study presents a novel lightweight deep learning framework for real-time FMM on edge devices. Data were collected from 120 participants using a wearable device equipped with an inertial measurement unit, which captured both accelerometer and gyroscope data, coupled with a rigorous two-stage labeling protocol integrating maternal perception and ultrasound validation. We addressed class imbalance using virtual-rotation-based augmentation and adaptive clustering-based undersampling. The data were transformed into spectrograms using the Short-Time Fourier Transform, serving as input for deep learning models. To ensure model efficiency suitable for resource-constrained microcontrollers, we employed knowledge distillation, transferring knowledge from larger, high-performing teacher models to compact student architectures. Post-training integer quantization further optimized the models, reducing the memory footprint by 74.8%. The final optimized model achieved a sensitivity (SEN) of 90.05%, a precision (PRE) of 87.29%, and an F1-score (F1) of 88.64%. Practical energy assessments showed continuous operation capability for approximately 25 h on a single battery charge. Our approach offers a practical framework adaptable to other medical monitoring tasks on edge devices, paving the way for improved prenatal care, especially in resource-limited settings. Full article

(This article belongs to the Section Wearable Biosensors)

►▼ Show Figures

Figure 1

22 pages, 2526 KB

Open AccessArticle

An Explainable Deep Learning Framework with Adaptive Feature Selection for Smart Lemon Disease Classification in Agriculture

by Naeem Ullah, Michelina Ruocco, Antonio Della Cioppa, Ivanoe De Falco and Giovanna Sannino

Electronics 2025, 14(19), 3928; https://doi.org/10.3390/electronics14193928 - 2 Oct 2025

Abstract

Early and accurate detection of lemon disease is necessary for effective citrus crop management. Traditional approaches often lack refined diagnosis, necessitating more powerful solutions. The article introduces adaptive PSO-LemonNetX, a novel framework integrating a novel deep learning model, adaptive Particle Swarm Optimization (PSO)-based feature selection, and explainable AI (XAI) using LIME. The approach improves the accuracy of classification while also enhancing the explainability of the model. Our end-to-end model obtained 97.01% testing and 98.55% validation accuracy. Performance was enhanced further with adaptive PSO and conventional classifiers—100% validation accuracy using Naive Bayes and 98.8% testing accuracy using Naive Bayes and an SVM. The suggested PSO-based feature selection performed better than ReliefF, Kruskal–Wallis, and Chi-squared approaches. Due to its lightweight design and good performance, this approach can be adapted for edge devices in IoT-enabled smart farms, contributing to sustainable and automated disease detection systems. These results show the potential of integrating deep learning, PSO, grid search, and XAI into smart agriculture workflows for enhancing agricultural disease detection and decision-making. Full article

(This article belongs to the Special Issue Image Processing and Pattern Recognition)

►▼ Show Figures

Figure 1

26 pages, 6668 KB

Open AccessArticle

Using Entity-Aware LSTM to Enhance Streamflow Predictions in Transboundary and Large Lake Basins

by Yunsu Park, Xiaofeng Liu, Yuyue Zhu and Yi Hong

Hydrology 2025, 12(10), 261; https://doi.org/10.3390/hydrology12100261 - 2 Oct 2025

Abstract

Hydrological simulation of large, transboundary water systems like the Laurentian Great Lakes remains challenging. Although deep learning has advanced hydrologic forecasting, prior efforts are fragmented, lacking a unified basin-wide model for daily streamflow. We address this gap by developing a single Entity-Aware Long Short-Term Memory (EA-LSTM) model, an architecture that distinctly processes static catchment attributes and dynamic meteorological forcings, trained without basin-specific calibration. We compile a cross-border dataset integrating daily meteorological forcings, static catchment attributes, and observed streamflow for 975 sub-basins across the United States and Canada (1980–2023). With a temporal training/testing split, the unified EA-LSTM attains a median Nash–Sutcliffe Efficiency (NSE) of 0.685 and a median Kling–Gupta Efficiency (KGE) of 0.678 in validation, substantially exceeding a standard LSTM (median NSE 0.567, KGE 0.555) and the operational NOAA National Water Model (median NSE 0.209, KGE 0.440). Although skill is reduced in the smallest basins (median NSE 0.554) and during high-flow events (median PBIAS −29.6%), the performance is robust across diverse hydroclimatic settings. These results demonstrate that a single, calibration-free deep learning model can provide accurate, scalable streamflow prediction across an international basin, offering a practical path toward unified forecasting for the Great Lakes and a transferable framework for other large, data-sparse watersheds. Full article

(This article belongs to the Special Issue Advancing Hydrological Science Through Artificial Intelligence: Innovations and Applications)

16 pages, 1227 KB

Open AccessArticle

Multimodal Behavioral Sensors for Lie Detection: Integrating Visual, Auditory, and Generative Reasoning Cues

by Daniel Grabowski, Kamila Łuczaj and Khalid Saeed

Sensors 2025, 25(19), 6086; https://doi.org/10.3390/s25196086 - 2 Oct 2025

Abstract

Advances in multimodal artificial intelligence enable new sensor-inspired approaches to lie detection by combining behavioral perception with generative reasoning. This study presents a deception detection framework that integrates deep video and audio processing with large language models guided by chain-of-thought (CoT) prompting. We interpret neural architectures such as ViViT (for video) and HuBERT (for speech) as digital behavioral sensors that extract implicit emotional and cognitive cues, including micro-expressions, vocal stress, and timing irregularities. We further incorporate a GPT-5-based prompt-level fusion approach for video–language–emotion alignment and zero-shot inference. This method jointly processes visual frames, textual transcripts, and emotion recognition outputs, enabling the system to generate interpretable deception hypotheses without any task-specific fine-tuning. Facial expressions are treated as high-resolution affective signals captured via visual sensors, while audio encodes prosodic markers of stress. Our experimental setup is based on the DOLOS dataset, which provides high-quality multimodal recordings of deceptive and truthful behavior. We also evaluate a continual learning setup that transfers emotional understanding to deception classification. Results indicate that multimodal fusion and CoT-based reasoning increase classification accuracy and interpretability. The proposed system bridges the gap between raw behavioral data and semantic inference, laying a foundation for AI-driven lie detection with interpretable sensor analogues. Full article

(This article belongs to the Special Issue Sensor-Based Behavioral Biometrics)

30 pages, 2037 KB

Open AccessArticle

From Market Volatility to Predictive Insight: An Adaptive Transformer–RL Framework for Sentiment-Driven Financial Time-Series Forecasting

by Zhicong Song, Harris Sik-Ho Tsang, Richard Tai-Chiu Hsung, Yulin Zhu and Wai-Lun Lo

Forecasting 2025, 7(4), 55; https://doi.org/10.3390/forecast7040055 - 2 Oct 2025

Abstract

Financial time-series prediction remains a significant challenge, driven by market volatility, nonlinear dynamic characteristics, and the complex interplay between quantitative indicators and investor sentiment. Traditional time-series models (e.g., ARIMA and GARCH) struggle to capture the nuanced sentiment in textual data, while static deep learning integration methods fail to adapt to market regime transitions (bull markets, bear markets, and consolidation). This study proposes a hybrid framework that integrates investor forum sentiment analysis with adaptive deep reinforcement learning (DRL) for dynamic model integration. By constructing a domain-specific financial sentiment dictionary (containing 16,673 entries) based on the sentiment analysis approach and word-embedding technique, we achieved up to 97.35% accuracy in forum title classification tasks. Historical price data and investor forum sentiment information were then fed into a Support Vector Regressor (SVR) and three Transformer variants (single-layer, multi-layer, and bidirectional variants) for predictions, with a Deep Q-Network (DQN) agent dynamically fusing the prediction results. Comprehensive experiments were conducted on diverse financial datasets, including China Unicom, the CSI 100 index, corn, and Amazon (AMZN). The experimental results demonstrate that our proposed approach, combining textual sentiment with adaptive DRL integration, significantly enhances prediction robustness in volatile markets, achieving the lowest RMSEs across diverse assets. It overcomes the limitations of static methods and multi-market generalization, outperforming both benchmark and state-of-the-art models. Full article

44 pages, 7867 KB

Open AccessArticle

Bridging AI and Maintenance: Fault Diagnosis in Industrial Air-Cooling Systems Using Deep Learning and Sensor Data

by Ioannis Polymeropoulos, Stavros Bezyrgiannidis, Eleni Vrochidou and George A. Papakostas

Machines 2025, 13(10), 909; https://doi.org/10.3390/machines13100909 - 2 Oct 2025

Abstract

This work aims towards the automatic detection of faults in industrial air-cooling equipment used in a production line for staple fibers and ultimately provides maintenance scheduling recommendations to ensure seamless operation. In this context, various deep learning models are tested to ultimately define the most effective one for the intended scope. In the examined system, four vibration and temperature sensors are used, each positioned radially on the motor body near the rolling bearing of the motor shaft—a typical setup in many industrial environments. Thus, by collecting and using data from the latter sources, this work exhaustively investigates the feasibility of accurately diagnosing faults in staple fiber cooling fans. The dataset is acquired and constructed under real production conditions, including variations in rotational speed, motor load, and three fault priorities, depending on the model detection accuracy, product specification, and maintenance requirements. Fault identification for training purposes involves analyzing and evaluating daily maintenance logs for this equipment. Experimental evaluation on real production data demonstrated that the proposed ResNet50-1D model achieved the highest overall classification accuracy of 97.77%, while effectively resolving the persistent misclassification of the faulty impeller observed in all the other models. Complementary evaluation confirmed its robustness, cross-machine generalization, and suitability for practical deployment, while the integration of predictions with maintenance logs enables a severity-based prioritization strategy that supports actionable maintenance planning.deep learning; fault classification; industrial air-cooling; industrial automation; maintenance scheduling; vibration analysis Full article

(This article belongs to the Special Issue Advancements in Condition Monitoring of Electric Motors: Integrating Digital Twins, AI, and IoT for Enhanced Operational Efficiency, Fault Diagnosis, and Cybersecurity)

16 pages, 13271 KB

Open AccessArticle

Smartphone-Based Estimation of Cotton Leaf Nitrogen: A Learning Approach with Multi-Color Space Fusion

by Shun Chen, Shizhe Qin, Yu Wang, Lulu Ma and Xin Lv

Agronomy 2025, 15(10), 2330; https://doi.org/10.3390/agronomy15102330 - 2 Oct 2025

Abstract

To address the limitations of traditional cotton leaf nitrogen content estimation methods, which include low efficiency, high cost, poor portability, and challenges in vegetation index acquisition owing to environmental interference, this study focused on emerging non-destructive nutrient estimation technologies. This study proposed an innovative method that integrates multi-color space fusion with deep and machine learning to estimate cotton leaf nitrogen content using smartphone-captured digital images. A dataset comprising smartphone-acquired cotton leaf images was processed through threshold segmentation and preprocessing, then converted into RGB, HSV, and Lab color spaces. The models were developed using deep-learning architectures including AlexNet, VGGNet-11, and ResNet-50. The conclusions of this study are as follows: (1) The optimal single-color-space nitrogen estimation model achieved a validation set R² of 0.776. (2) Feature-level fusion by concatenation of multidimensional feature vectors extracted from three color spaces using the optimal model, combined with an attention learning mechanism, improved the validation R² to 0.827. (3) Decision-level fusion by concatenating nitrogen estimation values from optimal models of different color spaces into a multi-source decision dataset, followed by machine learning regression modeling, increased the final validation R² to 0.830. The dual fusion method effectively enabled rapid and accurate nitrogen estimation in cotton crops using smartphone images, achieving an accuracy 5–7% higher than that of single-color-space models. The proposed method provides scientific support for efficient cotton production and promotes sustainable development in the cotton industry. Full article

(This article belongs to the Special Issue Crop Nutrition Diagnosis and Efficient Production)

►▼ Show Figures

Figure 1

22 pages, 782 KB

Open AccessArticle

Hybrid CNN-Swin Transformer Model to Advance the Diagnosis of Maxillary Sinus Abnormalities on CT Images Using Explainable AI

by Mohammad Alhumaid and Ayman G. Fayoumi

Computers 2025, 14(10), 419; https://doi.org/10.3390/computers14100419 - 2 Oct 2025

Abstract

Accurate diagnosis of sinusitis is essential due to its widespread prevalence and its considerable impact on patient quality of life. While multiple imaging techniques are available for detecting maxillary sinus, computed tomography (CT) remains the preferred modality because of its high sensitivity and spatial resolution. Although recent advances in deep learning have led to the development of automated methods for sinusitis classification, many existing models perform poorly in the presence of complex pathological features and offer limited interpretability, which hinders their integration into clinical workflows. In this study, we propose a hybrid deep learning framework that combines EfficientNetB0, a convolutional neural network, with the Swin Transformer, a vision transformer, to improve feature representation. An attention-based fusion module is used to integrate both local and global information, thereby enhancing diagnostic accuracy. To improve transparency and support clinical adoption, the model incorporates explainable artificial intelligence (XAI) techniques using Gradient-weighted Class Activation Mapping (Grad-CAM). This allows for visualization of the regions influencing the model’s predictions, helping radiologists assess the clinical relevance of the results. We evaluate the proposed method on a curated maxillary sinus CT dataset covering four diagnostic categories: Normal, Opacified, Polyposis, and Retention Cysts. The model achieves a classification accuracy of 95.83%, with precision, recall, and F1 score all at 95%. Grad-CAM visualizations indicate that the model consistently focuses on clinically significant regions of the sinus anatomy, supporting its potential utility as a reliable diagnostic aid in medical practice. Full article

(This article belongs to the Special Issue Application of Artificial Intelligence and Modeling Frameworks in Health Informatics and Related Fields)

23 pages, 1004 KB

Open AccessReview

Toward Transparent Modeling: A Scoping Review of Explainability for Arabic Sentiment Analysis

by Afnan Alsehaimi, Amal Babour and Dimah Alahmadi

Appl. Sci. 2025, 15(19), 10659; https://doi.org/10.3390/app151910659 - 2 Oct 2025

Abstract

The increasing prevalence of Arabic text in digital media offers significant potential for sentiment analysis. However, challenges such as linguistic complexity and limited resources make Arabic sentiment analysis (ASA) particularly difficult. In addition, explainable artificial intelligence (XAI) has become crucial for improving the transparency and trustworthiness of artificial intelligence (AI) models. This paper addresses the integration of XAI techniques in ASA through a scoping review of developments. This study critically identifies trends in model usage, examines explainability methods, and explores how these techniques enhance the explainability of model decisions. This review is crucial for consolidating fragmented efforts, identifying key methodological trends, and guiding future research in this emerging area. Online databases (IEEE Xplore, ACM Digital Library, Scopus, Web of Science, ScienceDirect, and Google Scholar) were searched to identify papers published between 1 January 2016 and 31 March 2025. The last search across all databases was conducted on 1 April 2025. From these, 19 peer-reviewed journal articles and conference papers focusing on ASA with explicit use of XAI techniques were selected for inclusion. This time frame was chosen to capture the most recent decade of research, reflecting advances in deep learning and the transformer-based and explainable AI methods. The findings indicate that transformer-based models and deep learning approaches dominate in ASA, achieving high accuracy, and that local interpretable model-agnostic explanations (LIME) is the most widely used explainability tool. However, challenges such as dialectal variation, small or imbalanced datasets, and the black box nature of advanced models persist. To address these challenges future research directions should include the creation of richer Arabic sentiment datasets, the development of hybrid explainability models, and the enhancement of adversarial robustness. Full article

►▼ Show Figures

Figure 1

14 pages, 1081 KB

Open AccessArticle

Hybrid Deep Learning Approach for Secure Electric Vehicle Communications in Smart Urban Mobility

by Abdullah Alsaleh

Vehicles 2025, 7(4), 112; https://doi.org/10.3390/vehicles7040112 - 2 Oct 2025

Abstract

The increasing adoption of electric vehicles (EVs) within intelligent transportation systems (ITSs) has elevated the importance of cybersecurity, especially with the rise in Vehicle-to-Everything (V2X) communications. Traditional intrusion detection systems (IDSs) struggle to address the evolving and complex nature of cyberattacks in such dynamic environments. To address these challenges, this study introduces a novel deep learning-based IDS designed specifically for EV communication networks. We present a hybrid model that integrates convolutional neural networks (CNNs), long short-term memory (LSTM) layers, and adaptive learning strategies. The model was trained and validated using the VeReMi dataset, which simulates a wide range of attack scenarios in V2X networks. Additionally, an ablation study was conducted to isolate the contribution of each of its modules. The model demonstrated strong performance with 98.73% accuracy, 97.88% precision, 98.91% sensitivity, and 98.55% specificity, as well as an F1-score of 98.39%, an MCC of 0.964, a false-positive rate of 1.45%, and a false-negative rate of 1.09%, with a detection latency of 28 ms and an AUC-ROC of 0.994. Specifically, this work fills a clear gap in the existing V2X intrusion detection literature—namely, the lack of scalable, adaptive, and low-latency IDS solutions for hardware-constrained EV platforms—by proposing a hybrid CNN–LSTM architecture coupled with an elastic weight consolidation (EWC)-based adaptive learning module that enables online updates without full retraining. The proposed model provides a real-time, adaptive, and high-precision IDS for EV networks, supporting safer and more resilient ITS infrastructures. Full article

►▼ Show Figures

Figure 1

23 pages, 698 KB

Open AccessReview

Machine Learning in Land Use Prediction: A Comprehensive Review of Performance, Challenges, and Planning Applications

by Cui Li, Cuiping Wang, Tianlei Sun, Tongxi Lin, Jiangrong Liu, Wenbo Yu, Haowei Wang and Lei Nie

Buildings 2025, 15(19), 3551; https://doi.org/10.3390/buildings15193551 - 2 Oct 2025

Abstract

The accelerated global urbanization process has positioned land use/land cover change modeling as a critical component of contemporary geographic science and urban planning research. Traditional approaches face substantial challenges when addressing urban system complexity, multiscale spatial interactions, and high-dimensional data associations, creating urgent demand for sophisticated analytical frameworks. This review comprehensively evaluates machine learning applications in land use prediction through systematic analysis of 74 publications spanning 2020–2024, establishing a taxonomic framework distinguishing traditional machine learning, deep learning, and hybrid methodologies. The review contributes a comprehensive methodological assessment identifying algorithmic evolution patterns and performance benchmarks across diverse geographic contexts. Traditional methods demonstrate sustained reliability, while deep learning architectures excel in complex pattern recognition. Most significantly, hybrid methodologies have emerged as the dominant paradigm through algorithmic complementarity, consistently outperforming single-algorithm implementations. However, contemporary applications face critical constraints including computational complexity, scalability limitations, and interpretability issues impeding practical adoption. This review advances the field by synthesizing fragmented knowledge into a coherent framework and identifying research trajectories toward integrated intelligent systems with explainable artificial intelligence. Full article

(This article belongs to the Special Issue Advances in Urban Planning and Design for Urban Safety and Operations)

►▼ Show Figures

Figure 1

21 pages, 3036 KB

Open AccessArticle

Infrared Thermography and Deep Learning Prototype for Early Arthritis and Arthrosis Diagnosis: Design, Clinical Validation, and Comparative Analysis

by Francisco-Jacob Avila-Camacho, Leonardo-Miguel Moreno-Villalba, José-Luis Cortes-Altamirano, Alfonso Alfaro-Rodríguez, Hugo-Nathanael Lara-Figueroa, María-Elizabeth Herrera-López and Pablo Romero-Morelos

Technologies 2025, 13(10), 447; https://doi.org/10.3390/technologies13100447 - 2 Oct 2025

Abstract

Arthritis and arthrosis are prevalent joint diseases that cause pain and disability, and their early diagnosis is crucial for preventing irreversible damage. Conventional diagnostic methods such as X-ray, ultrasound, and MRI have limitations in early detection, prompting interest in alternative techniques. This work presents the design and clinical evaluation of a prototype device for non-invasive early diagnosis of arthritis (inflammatory joint disease) and arthrosis (osteoarthritis) using infrared thermography and deep neural networks. The portable prototype integrates a Raspberry Pi 4 microcomputer, an infrared thermal camera, and a touchscreen interface, all housed in a 3D-printed PLA enclosure. A custom Flask-based application enables two operational modes: (1) thermal image acquisition for training data collection, and (2) automated diagnosis using a pre-trained ResNet50 deep learning model. A clinical study was conducted at a university clinic in a temperature-controlled environment with 100 subjects (70% with arthritic conditions and 30% healthy). Thermal images of both hands (four images per hand) were captured for each participant, and all patients provided informed consent. The ResNet50 model was trained to classify three classes (healthy, arthritis, and arthrosis) from these images. Results show that the system can effectively distinguish healthy individuals from those with joint pathologies, achieving an overall test accuracy of approximately 64%. The model identified healthy hands with high confidence (100% sensitivity for the healthy class), but it struggled to differentiate between arthritis and arthrosis, often misclassifying one as the other. The prototype’s multiclass ROC (Receiver Operating Characteristic) analysis further showed excellent discrimination between healthy vs. diseased groups (AUC, Area Under the Curve ~1.00), but lower performance between arthrosis and arthritis classes (AUC ~0.60–0.68). Despite these challenges, the device demonstrates the feasibility of AI-assisted thermographic screening: it is completely non-invasive, radiation-free, and low-cost, providing results in real-time. In the discussion, we compare this thermography-based approach with conventional diagnostic modalities and highlight its advantages, such as early detection of physiological changes, portability, and patient comfort. While not intended to replace established methods, this technology can serve as an early warning and triage tool in clinical settings. In conclusion, the proposed prototype represents an innovative application of infrared thermography and deep learning for joint disease screening. With further improvements in classification accuracy and broader validation, such systems could significantly augment current clinical practice by enabling rapid and non-invasive early diagnosis of arthritis and arthrosis. Full article

(This article belongs to the Section Assistive Technologies)

►▼ Show Figures

Graphical abstract

17 pages, 627 KB

Open AccessArticle

Advancing Urban Planning with Deep Learning: Intelligent Traffic Flow Prediction and Optimization for Smart Cities

by Fatema A. Albalooshi

Future Transp. 2025, 5(4), 133; https://doi.org/10.3390/futuretransp5040133 - 2 Oct 2025

Abstract

The accelerating pace of urbanization has significantly complicated traffic management systems, leading to mounting challenges, such as persistent congestion, increased travel delays, and heightened environmental impacts. In response to these challenges, this study presents a novel deep learning framework designed to enhance short-term traffic flow prediction and support intelligent transportation systems within the context of smart cities. The proposed model integrates Gated Recurrent Units (GRUs) and Long Short-Term Memory (LSTM) networks, augmented by an attention mechanism that dynamically emphasizes relevant temporal patterns. The model was rigorously evaluated using the publicly available datasets and demonstrated substantial improvements over current state-of-the-art methods. Specifically, the proposed framework achieves a 3.75% reduction in the Mean Absolute Error (MAE), a 2.00% reduction in the Root Mean Squared Error (RMSE), and a 4.17% reduction in the Mean Absolute Percentage Error (MAPE) compared to the baseline models. The enhanced predictive accuracy and computational efficiency offer significant benefits for intelligent traffic control, dynamic route planning, and proactive congestion management, thereby contributing to the development of more sustainable and efficient urban mobility systems. Full article

►▼ Show Figures

Figure 1

23 pages, 1548 KB

Open AccessArticle

Customizable Length Constrained Image-Text Summarization via Knapsack Optimization

by Xuan Liu, Xiangyu Qu, Yu Weng, Yutong Gao, Zheng Liu and Xianggan Liu

Symmetry 2025, 17(10), 1629; https://doi.org/10.3390/sym17101629 - 2 Oct 2025

Abstract

With the proliferation of multimedia data, controllable summarization generation has become a key focus in Artificial Intelligence Content Generation. However, many traditional methods lack precise control over output length, often resulting in summaries that are either too verbose or too brief, thus failing to meet diverse user needs. In this paper, we propose a length-customizable approach for multimodal image-text summarization. Our method integrates combinatorial optimization with deep learning to address the length-control challenge. Specifically, we formulate the summarization task as a knapsack optimization problem, enhanced by a greedy algorithm to strictly adhere to user-defined length constraints. Additionally, we introduce a multimodal attention mechanism to ensure balanced and coherent integration of textual and visual information. To further enhance semantic alignment, we employ a cross-modal matching strategy for image selection based on pre-trained vision-language models. Experimental evaluations on the MSMO dataset and validate against baselines like LEAD-3, Seq2Seq, Attention, and Transformer that our method achieves a ROUGE-1 score of 40.52, ROUGE-2 of 16.07, and ROUGE-L of 35.15, outperforming existing length-controllable baselines. Moreover, our approach attains the lowest length variance, confirming its precise adherence to target summary lengths. These results validate the effectiveness of our method in generating high-quality, length-constrained multimodal summaries. Full article

(This article belongs to the Section Computer)

►▼ Show Figures

Figure 1

Show export options Show export options

Select all

Export citation of selected articles as:

Error

Oops... you haven't selected anything for export.

Displaying article 1-50 on page 1 of 116.

Go to page 1 2 3 4 5

Search Results (5,788)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI