MDPI - Publisher of Open Access Journals

20 pages, 7466 KB

Open AccessArticle

Feasibility Study of CLIP-Based Key Slice Selection in CT Images and Performance Enhancement via Lesion- and Organ-Aware Fine-Tuning

by Kohei Yamamoto and Tomohiro Kikuchi

Bioengineering 2025, 12(10), 1093; https://doi.org/10.3390/bioengineering12101093 - 10 Oct 2025

Abstract

Large-scale medical visual question answering (MedVQA) datasets are critical for training and deploying vision–language models (VLMs) in radiology. Ideally, such datasets should be automatically constructed from routine radiology reports and their corresponding images. However, no existing method directly links free-text findings to the [...] Read more.

Large-scale medical visual question answering (MedVQA) datasets are critical for training and deploying vision–language models (VLMs) in radiology. Ideally, such datasets should be automatically constructed from routine radiology reports and their corresponding images. However, no existing method directly links free-text findings to the most relevant 2D slices in volumetric computed tomography (CT) scans. To address this gap, a contrastive language–image pre-training (CLIP)-based key slice selection framework is proposed, which matches each sentence to its most informative CT slice via text–image similarity. This experiment demonstrates that models pre-trained in the medical domain already achieve competitive slice retrieval accuracy and that fine-tuning them on a small dual-supervised dataset that imparts both lesion- and organ-level awareness yields further gains. In particular, the best-performing model (fine-tuned BiomedCLIP) achieved a Top-1 accuracy of 51.7% for lesion-aware slice retrieval, representing a 20-point improvement over baseline CLIP, and was accepted by radiologists in 56.3% of cases. By automating the report-to-slice alignment, the proposed method facilitates scalable, clinically realistic construction of MedVQA resources. Full article

(This article belongs to the Special Issue Machine Learning-Driven Innovations in Biomedical Signal and Image Processing)

► Show Figures

Graphical abstract

16 pages, 7184 KB

Open AccessArticle

Towards Robust Scene Text Recognition: A Dual Correction Mechanism with Deformable Alignment

by Yajiao Feng and Changlu Li

Electronics 2025, 14(19), 3968; https://doi.org/10.3390/electronics14193968 - 9 Oct 2025

Abstract

Scene Text Recognition (STR) faces significant challenges under complex degradation conditions, such as distortion, occlusion, and semantic ambiguity. Most existing methods rely heavily on language priors for correction, but effectively constructing language rules remains a complex problem. This paper addresses two key challenges: [...] Read more.

Scene Text Recognition (STR) faces significant challenges under complex degradation conditions, such as distortion, occlusion, and semantic ambiguity. Most existing methods rely heavily on language priors for correction, but effectively constructing language rules remains a complex problem. This paper addresses two key challenges: (1) The over-correction behavior of language models, particularly on semantically deficient input, can result in both recognition errors and loss of critical information. (2) Character misalignment in visual features, which affects recognition accuracy. To address these problems, we propose a Deformable-Alignment-based Dual Correction Mechanism (DADCM) for STR. Our method includes the following key components: (1) We propose a visually guided and language-assisted correction strategy. A dynamic confidence threshold is used to control the degree of language model intervention. (2) We designed a visual backbone network called SCRTNet. The net enhances key text regions through a channel attention module (SENet) and applies deformable convolution (DCNv4) in deep layers to better model distorted or curved text. (3) We propose a deformable alignment module (DAM). The module combines Gumbel-Softmax-based anchor sampling and geometry-aware self-attention to improve character alignment. Experiments on multiple benchmark datasets demonstrate the superiority of our approach. Especially on the Union14M-Benchmark, where the recognition accuracy surpasses previous methods by 1.1%, 1.6%, 3.0%, and 1.3% on the Curved, Multi-Oriented, Contextless, and General subsets, respectively. Full article

► Show Figures

Figure 1

20 pages, 568 KB

Open AccessArticle

“I Know How to Speak Spanish My Way”: Incorporating Critically Oriented Sociolinguistic Topics in Heritage Language Classrooms

by Sara I. Roca-Ramirez

Languages 2025, 10(10), 258; https://doi.org/10.3390/languages10100258 - 7 Oct 2025

Viewed by 230

Abstract

This study advances Spanish Heritage Language (SHL) pedagogy by investigating the integration of Critically Oriented Sociolinguistic Topics (COST) into the heritage language curriculum. Thirteen self-identified SHL students from three courses (Intermediate, Advanced I, and Advanced II) at two universities in the Washington, D.C. [...] Read more.

This study advances Spanish Heritage Language (SHL) pedagogy by investigating the integration of Critically Oriented Sociolinguistic Topics (COST) into the heritage language curriculum. Thirteen self-identified SHL students from three courses (Intermediate, Advanced I, and Advanced II) at two universities in the Washington, D.C. metro area participated in semi-structured Zoom interviews exploring their motivations for enrolling in an SHL class, their perceptions of Spanish, and the impact of COST. Analysis identified recurring themes about underlying language ideologies and enrollment motivations, such as improving academic Spanish and grammar, career preparation, and connecting with course topics. Dominant ideologies, including essentialist, standard language, deficit, and commodification, were evident in students’ perceptions of Spanish and Latinx communities in the U.S. and abroad. Findings showed that students developed critical awareness of language variation that supported validation of their HL practices and the emergence of student agency. Some students moved from reproducing to contesting deficit and standard ideologies, asserting legitimacy for their own bilingual repertoires. These findings underscore the need for integrating COST in SHL courses to promote student agency, foster positive attitudes, and strengthen students’ linguistic confidence. Full article

► Show Figures

Figure 1

16 pages, 938 KB

Open AccessArticle

Contextual Approaches in Biblical Exegesis—An Exploration and Exemplification

by Jörg Frey, Kyung Min Kim and Tsion Seyoum Meren

Religions 2025, 16(10), 1245; https://doi.org/10.3390/rel16101245 - 29 Sep 2025

Viewed by 412

Abstract

The article is focused on the recent exegetical trend of “contextual” readings of the Bible, or context-sensitive exegesis in global Biblical scholarship. It is written by three authors from different ethnic and cultural contexts (German, Korean, Ethiopian) in order to emphasize the diversity [...] Read more.

The article is focused on the recent exegetical trend of “contextual” readings of the Bible, or context-sensitive exegesis in global Biblical scholarship. It is written by three authors from different ethnic and cultural contexts (German, Korean, Ethiopian) in order to emphasize the diversity to be considered. In the first part, the aims, history and relevant factors of contextual reading are described. The second part makes clear that also the traditional historical-critical exegesis is strongly contextual, drawing on Enlightenment thought and Western views of life. Therefore, any claims of “objectivity” or universality are problematic. In the third and fourth section of the article, two different contexts from global Christianity or the Majority World are introduced. first the African, especially Ethiopian context under the label of “vulnerability”, and then an Asian, precisely South Korean context with regard to the understanding of spirits and demons. The Ethiopian author describes how vulnerability has generally shaped the African cultural experience and specifically common language in Ethiopia, including religious attitudes which are characterized by a general openness for the divine. She also shows, that in such a culture, with the danger of naivete and acceptance of many problematic interpretations critical discernment is needed, as has already been stated by an Ethiopian philosopher of the 17th century. The part on Korean interpretation discusses the various views on spirits and demons in Korean Bible translations and the influence of Confucian thought and Shamanism on readings of the Bible. Using the example of the Gerasene demoniac, the author shows readers aware of shamanic ritual including pigs and intended to pacify the restless souls can impact the reading of this particular Biblical text even among modern Koreans. A brief concluding section draws some conclusions. Both examples demonstrate the diversity of contexts and their resonances with the Biblical texts when they are read in these different contexts. It is also obvious that there is not a single clear-cut dualism between Western and “postcolonial” readings. Neither the historical readings nor the contextual are “right” as such. Rather, there should be an open dialogue, on equal footing, that considers the context and also allows for critical interaction in order to prevent abuse of biblical texts, not only in colonial relations, but also within a given context by traditionalists, political powers, and spiritual authorities, so that the liberating power of the gospel can come into effect, for the benefit its readers. Full article

(This article belongs to the Special Issue New Testament Studies—Current Trends and Criticisms—2nd Edition)

11 pages, 209 KB

Open AccessArticle

Scaffolding of Success: Support, Educational Equity and the Lifelong Reality of Care Experience

by Claire Wilson, Shannon Valentine and Chelbi Hillan

Youth 2025, 5(4), 101; https://doi.org/10.3390/youth5040101 - 24 Sep 2025

Viewed by 336

Abstract

Transitions from care into adulthood are often a shift from dependence to independence. Yet for care-experienced individuals, this process is neither linear nor complete at a predetermined age. Despite progressive Scottish policies—such as The Promise—many still face unequal access to support. This article [...] Read more.

Transitions from care into adulthood are often a shift from dependence to independence. Yet for care-experienced individuals, this process is neither linear nor complete at a predetermined age. Despite progressive Scottish policies—such as The Promise—many still face unequal access to support. This article explores how structural and relational scaffolding can transform outcomes. Drawing on the lived and professional knowledge of three care-experienced authors, it examines how language, relationship-based practice, and support influence definitions of success. Reframing care experience as lifelong challenges systems to provide enduring, person-centered support. While research affirms the importance of responsive scaffolding, few studies center the voices of care-experienced adults in defining what effective support looks like. This article addresses that gap by placing care-experienced authors not as subjects, but as analysts and advocates. The article is based on a collaborative, care-informed reflective process. The authors adapted the Gibbs Reflective Cycle to suit a trauma-aware and relational approach. Their reflections are not anecdotal—they are critically analyzed, thematically structured, and used as evidence to interrogate systems and propose alternatives. Key findings highlight the importance of sustained relational practice, responsive educational support, and recognizing care experience as lifelong. Full article

(This article belongs to the Special Issue Youth Transitions from Care: Towards Improved Care-Leaving Outcomes)

21 pages, 3747 KB

Open AccessArticle

Open-Vocabulary Crack Object Detection Through Attribute-Guided Similarity Probing

by Hyemin Yoon and Sangjin Kim

Appl. Sci. 2025, 15(19), 10350; https://doi.org/10.3390/app151910350 - 24 Sep 2025

Viewed by 495

Abstract

Timely detection of road surface defects such as cracks and potholes is critical for ensuring traffic safety and reducing infrastructure maintenance costs. While recent advances in image-based deep learning techniques have shown promise for automated road defect detection, existing models remain limited to [...] Read more.

Timely detection of road surface defects such as cracks and potholes is critical for ensuring traffic safety and reducing infrastructure maintenance costs. While recent advances in image-based deep learning techniques have shown promise for automated road defect detection, existing models remain limited to closed-set detection settings, making it difficult to recognize newly emerging or fine-grained defect types. To address this limitation, we propose an attribute-aware open-vocabulary crack detection (AOVCD) framework, which leverages the alignment capability of pretrained vision–language models to generalize beyond fixed class labels. In this framework, crack types are represented as combinations of visual attributes, enabling semantic grounding between image regions and natural language descriptions. To support this, we extend the existing PPDD dataset with attribute-level annotations and incorporate a multi-label attribute recognition task as an auxiliary objective. Experimental results demonstrate that the proposed AOVCD model outperforms existing baselines. In particular, compared to CLIP-based zero-shot inference, the proposed model achieves approximately a 10-fold improvement in average precision (AP) for novel crack categories. Attribute classification performance—covering geometric, spatial, and textural features—also increases by 40% in balanced accuracy (BACC) and 23% in AP. These results indicate that integrating structured attribute information enhances generalization to previously unseen defect types, especially those involving subtle visual cues. Our study suggests that incorporating attribute-level alignment within a vision–language framework can lead to more adaptive and semantically grounded defect recognition systems. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

26 pages, 1823 KB

Open AccessArticle

Scalable Gender Profiling from Turkish Texts Using Deep Embeddings and Meta-Heuristic Feature Selection

by Hakan Gunduz

J. Theor. Appl. Electron. Commer. Res. 2025, 20(4), 253; https://doi.org/10.3390/jtaer20040253 - 24 Sep 2025

Viewed by 387

Abstract

Accurate gender identification from written text is critical for author profiling, recommendation systems, and demographic analytics in digital ecosystems. This study introduces a scalable framework for gender classification in Turkish, combining contextualized BERTurk and subword-aware FastText embeddings with three meta-heuristic feature selection algorithms: [...] Read more.

Accurate gender identification from written text is critical for author profiling, recommendation systems, and demographic analytics in digital ecosystems. This study introduces a scalable framework for gender classification in Turkish, combining contextualized BERTurk and subword-aware FastText embeddings with three meta-heuristic feature selection algorithms: Genetic Algorithm (GA), Jaya and Artificial Rabbit Optimization (ARO). Evaluated on the IAG-TNKU corpus of 43,292 balanced Turkish news articles, the best-performing model—BERTurk+GA+LSTM—achieves 89.7% accuracy, while ARO reduces feature dimensionality by 90% with minimal performance loss. Beyond in-domain results, exploratory zero-shot and few-shot adaptation experiments on Turkish e-commerce product reviews demonstrate the framework’s transferability: while zero-shot performance dropped to 59.8%, few-shot adaptation with only 200–400 labeled samples raised accuracy to 69.6–72.3%. These findings highlight both the limitations of training exclusively on news articles and the practical feasibility of adapting the framework to consumer-generated content with minimal supervision. In addition to technical outcomes, we critically examine ethical considerations in gender inference, including fairness, representation, and the binary nature of current datasets. This work contributes a reproducible and linguistically informed baseline for gender profiling in morphologically rich, low-resource languages, with demonstrated potential for adaptation across domains such as social media and e-commerce personalization. Full article

(This article belongs to the Special Issue Human–Technology Synergies in AI-Driven E-Commerce Environments)

► Show Figures

Figure 1

33 pages, 1483 KB

Open AccessFeature PaperArticle

From Model to Mechanism: Enforcing Delegated Authority in SSI with Language-Based Security

by Muhamed Turkanović, Vid Keršič, Alen Horvat, Dominik Beron and Špela Čučko

Mathematics 2025, 13(18), 2971; https://doi.org/10.3390/math13182971 - 14 Sep 2025

Viewed by 800

Abstract

Delegation of authority remains a critical yet insufficiently addressed capability in Self-Sovereign Identity (SSI) systems. Building on an existing delegation model that introduced the concept of a Verifiable Mandate (VM) for expressing authority and access rights, this paper extends the approach with a [...] Read more.

Delegation of authority remains a critical yet insufficiently addressed capability in Self-Sovereign Identity (SSI) systems. Building on an existing delegation model that introduced the concept of a Verifiable Mandate (VM) for expressing authority and access rights, this paper extends the approach with a rigorous formalization of delegation semantics, enabling unambiguous reasoning over roles, grants, and constraints. The formal model is aligned with standards from the World Wide Web Consortium (W3C), and its constructs are embedded into an extended credential schema that preserves compatibility with the Verifiable Credentials (VC) data model while introducing delegation-specific attributes. A generalized VM schema is defined, supporting both generic and business-specific instantiations, and ensuring structural and semantic interoperability. Policy compliance is operationalized through a policy-based enforcement architecture, where rules are authored in the Rego language and evaluated at runtime by the Open Policy Agent (OPA). The architecture incorporates trusted registries for schema and policy distribution, allowing verifiers to define and enforce context-specific delegation rules in a modular and interoperable manner. Validation through realistic scenarios, such as postal service and academic use cases, demonstrates how formal semantics, schema validation, and language-based policy enforcement can be combined to enable secure, verifiable, and context-aware delegation in SSI ecosystems. Full article

(This article belongs to the Special Issue Applied Cryptography and Blockchain Security)

► Show Figures

Figure 1

26 pages, 2173 KB

Open AccessArticle

RAMHA: A Hybrid Social Text-Based Transformer with Adapter for Mental Health Emotion Classification

by Mahander Kumar, Lal Khan and Ahyoung Choi

Mathematics 2025, 13(18), 2918; https://doi.org/10.3390/math13182918 - 9 Sep 2025

Cited by 1 | Viewed by 607

Abstract

Depression, stress, and anxiety are mental health disorders that are increasingly becoming a huge challenge in the digital age; at the same time, it is critical that they are detected early. Social media is a rich and complex source of emotional expressions that [...] Read more.

Depression, stress, and anxiety are mental health disorders that are increasingly becoming a huge challenge in the digital age; at the same time, it is critical that they are detected early. Social media is a rich and complex source of emotional expressions that requires intelligent systems that can decode subtle psychological states from natural language. This paper presents RAMHA (RoBERTa with Adapter-based Mental Health Analyzer), a hybrid deep learning model that combines RoBERTa, parameter-efficient adapter layers, BiLSTM, and attention mechanisms and is further optimized with focal loss to address the class imbalance problem. When tested on three filtered versions of the GoEmotions dataset, RAMHA shows outstanding results, with a maximum accuracy of 92% in binary classification and 88% in multiclass tasks. A large number of experiments are performed to compare RAMHA with eight standard baseline models, including SVM, LSTM, and BERT. In these experiments, RAMHA is able to consistently outperform the other models in terms of accuracy, precision, recall, and F1-score. Ablation studies further confirm the contributions of the individual components of the architecture, and comparative analysis demonstrates that RAMHA outperforms the best previously reported F1-scores by a substantial margin. The results of our study not only indicate the potential of the adapter-enhanced transformer in emotion-aware mental health screening but also establish a solid basis for its use in clinical and social settings. Full article

(This article belongs to the Section E1: Mathematics and Computer Science)

► Show Figures

Figure 1

18 pages, 1099 KB

Open AccessArticle

Human–AI Teaming in Structural Analysis: A Model Context Protocol Approach for Explainable and Accurate Generative AI

by Carlos Avila, Daniel Ilbay and David Rivera

Buildings 2025, 15(17), 3190; https://doi.org/10.3390/buildings15173190 - 4 Sep 2025

Viewed by 1222

Abstract

The integration of large language models (LLMs) into structural engineering workflows presents both a transformative opportunity and a critical challenge. While LLMs enable intuitive, natural language interactions with complex data, their limited arithmetic reasoning, contextual fragility, and lack of verifiability constrain their application [...] Read more.

The integration of large language models (LLMs) into structural engineering workflows presents both a transformative opportunity and a critical challenge. While LLMs enable intuitive, natural language interactions with complex data, their limited arithmetic reasoning, contextual fragility, and lack of verifiability constrain their application in safety-critical domains. This study introduces a novel automation pipeline that couples generative AI with finite element modelling through the Model Context Protocol (MCP)—a modular, context-aware architecture that complements language interpretation with structural computation. By interfacing GPT-4 with OpenSeesPy via MCP (JSON schemas, API interfaces, communication standards), the system allows engineers to specify and evaluate 3D frame structures using conversational prompts, while ensuring computational fidelity and code compliance. Across four case studies, the GPT+MCP framework demonstrated predictive accuracy for key structural parameters, with deviations under 1.5% compared to reference solutions produced using conventional finite element analysis workflows. In contrast, unconstrained LLM use produces deviations exceeding 400%. The architecture supports reproducibility, traceability, and rapid analysis cycles (6–12 s), enabling real-time feedback for both design and education. This work establishes a reproducible framework for trustworthy AI-assisted analysis in engineering, offering a scalable foundation for future developments in optimisation and regulatory automation. Full article

(This article belongs to the Special Issue Automation and Intelligence in the Construction Industry)

► Show Figures

Figure 1

22 pages, 47099 KB

Open AccessArticle

Deciphering Emotions in Children’s Storybooks: A Comparative Analysis of Multimodal LLMs in Educational Applications

by Bushra Asseri, Estabrag Abaker, Maha Al Mogren, Tayef Alhefdhi and Areej Al-Wabil

AI 2025, 6(9), 211; https://doi.org/10.3390/ai6090211 - 2 Sep 2025

Viewed by 871

Abstract

Emotion recognition capabilities in multimodal AI systems are crucial for developing culturally responsive educational technologies yet remain underexplored for Arabic language contexts, where culturally appropriate learning tools are critically needed. This study evaluated the emotion recognition performance of two advanced multimodal large language [...] Read more.

Emotion recognition capabilities in multimodal AI systems are crucial for developing culturally responsive educational technologies yet remain underexplored for Arabic language contexts, where culturally appropriate learning tools are critically needed. This study evaluated the emotion recognition performance of two advanced multimodal large language models, GPT-4o and Gemini 1.5 Pro, when processing Arabic children’s storybook illustrations. We assessed both models across three prompting strategies (zero-shot, few-shot, and chain-of-thought) using 75 images from seven Arabic storybooks, comparing model predictions with human annotations based on Plutchik’s emotional framework. GPT-4o consistently outperformed Gemini across all conditions, achieving the highest macro F1-score of 59% with chain-of-thought prompting compared to Gemini’s best performance of 43%. Error analysis revealed systematic misclassification patterns, with valence inversions accounting for 60.7% of errors, while both models struggled with culturally nuanced emotions and ambiguous narrative contexts. These findings highlight fundamental limitations in current models’ cultural understanding and emphasize the need for culturally sensitive training approaches to develop effective emotion-aware educational technologies for Arabic-speaking learners. Full article

(This article belongs to the Special Issue Exploring the Use of Artificial Intelligence in Education)

► Show Figures

Figure 1

17 pages, 3167 KB

Open AccessArticle

USV-Seg: A Vision-Language Framework for Guided Segmentation of USV with Physical Constraint Optimization

by Wenqiang Zhan, Qianqian Chen, Rongkun Zhou, Shenghua Chen, Xinlong Zhang, Lei Ma, Yan Wang and Guiyin Liu

Electronics 2025, 14(17), 3491; https://doi.org/10.3390/electronics14173491 - 31 Aug 2025

Viewed by 672

Abstract

Unmanned Surface Vehicles (USVs) play a critical role in maritime monitoring, environmental protection, and emergency response, necessitating accurate scene understanding in complex aquatic environments. Conventional semantic segmentation methods often fail to capture global context and lack physical boundary consistency, limiting real-world performance. This [...] Read more.

Unmanned Surface Vehicles (USVs) play a critical role in maritime monitoring, environmental protection, and emergency response, necessitating accurate scene understanding in complex aquatic environments. Conventional semantic segmentation methods often fail to capture global context and lack physical boundary consistency, limiting real-world performance. This paper proposes USV-Seg, a unified segmentation framework integrating a vision-language model, the Segment Anything Model (SAM), DINOv2-based visual features, and a physically constrained refinement module. We design a task-specific <Describe> Token to enable fine-grained semantic reasoning of navigation scenes, considering USV-to-shore distance, landform complexity, and water surface texture. A mask selection algorithm based on multi-layer Intersection-over-Prediction (IoP) heads improves segmentation precision across sky, water, and obstacle regions. A boundary-aware correction module refines outputs using estimated sky-water and land-water boundaries, enhancing robustness and realism. Unlike prior works that simply apply vision-language or geometric post-processing in isolation, USV-Seg integrates structured scene reasoning and scene-aware boundary constraints into a unified and physically consistent framework. Experiments on a real-world USV dataset demonstrate that USV-Seg outperforms state-of-the-art methods, achieving 96.30% mIoU in challenging near-shore scenarios. Full article

► Show Figures

Graphical abstract

24 pages, 1294 KB

Open AccessArticle

Student Perceptions of Digital Tools in Language and Translation Programs: A Survey-Based Case Study at the University of Maribor, Slovenia

by Bernarda Leva, Tomaž Onič, Tadej Todorović, Jurij Urh and David Hazemali

Educ. Sci. 2025, 15(9), 1119; https://doi.org/10.3390/educsci15091119 - 28 Aug 2025

Viewed by 929

Abstract

This study investigates how students of English Language and Literature Studies and those of Translation at the University of Maribor, Slovenia, perceive and engage with digital tools in academic and language learning contexts. Although students report high levels of confidence in their digital [...] Read more.

This study investigates how students of English Language and Literature Studies and those of Translation at the University of Maribor, Slovenia, perceive and engage with digital tools in academic and language learning contexts. Although students report high levels of confidence in their digital skills and express positive attitudes towards educational technologies, the survey results reveal a significant gap between perceived competence and actual usage. The study identifies the underutilization of institutional tools, limited awareness of resources available, and a reliance on general-purpose search engines rather than academic platforms. These findings highlight the need for improved digital literacy training, structured onboarding, and integration of digital tools into discipline-specific curricula. By focusing on a student population specializing in linguistics and translation in a Central and Eastern European context, this research contributes a localized perspective to broader discussions on digital transformation in higher education. The study offers applicable recommendations for enhancing institutional strategies and supporting students in becoming competent and critical users of educational technology. Full article

(This article belongs to the Special Issue Reading and Writing in the Digital Age: Supporting Language and Literacy Development for Students)

► Show Figures

Figure 1

21 pages, 655 KB

Open AccessArticle

A Novel Framework Leveraging Large Language Models to Enhance Cold-Start Advertising Systems

by Albin Uruqi, Iosif Viktoratos and Athanasios Tsadiras

Future Internet 2025, 17(8), 360; https://doi.org/10.3390/fi17080360 - 8 Aug 2025

Viewed by 1125

Abstract

The cold-start problem remains a critical challenge in personalized advertising, where users with limited or no interaction history often receive suboptimal recommendations. This study introduces a novel, three-stage framework that systematically integrates transformer architectures and large language models (LLMs) to improve recommendation accuracy, [...] Read more.

The cold-start problem remains a critical challenge in personalized advertising, where users with limited or no interaction history often receive suboptimal recommendations. This study introduces a novel, three-stage framework that systematically integrates transformer architectures and large language models (LLMs) to improve recommendation accuracy, transparency, and user experience throughout the entire advertising pipeline. The proposed approach begins with transformer-enhanced feature extraction, leveraging self-attention and learned positional encodings to capture deep semantic relationships among users, ads, and context. It then employs an ensemble integration strategy combining enhanced state-of-the-art models with optimized aggregation for robust prediction. Finally, an LLM-driven enhancement module performs semantic reranking, personalized message refinement, and natural language explanation generation while also addressing cold-start scenarios through pre-trained knowledge. The LLM component further supports diversification, fairness-aware ranking, and sentiment sensitivity in order to ensure more relevant, diverse, and ethically grounded recommendations. Extensive experiments on DigiX and Avazu datasets demonstrate notable gains in click-through rate prediction (CTR), while an in-depth real user evaluation showcases improvements in perceived ad relevance, message quality, transparency, and trust. This work advances the state-of-the-art by combining CTR models with interpretability and contextual reasoning. The strengths of the proposed method, such as its innovative integration of components, empirical validation, multifaceted LLM application, and ethical alignment highlight its potential as a robust, future-ready solution for personalized advertising. Full article

(This article belongs to the Special Issue Information Networks with Human-Centric LLMs)

► Show Figures

Figure 1

20 pages, 983 KB

Open AccessArticle

A Library-Oriented Large Language Model Approach to Cross-Lingual and Cross-Modal Document Retrieval

by Wang Yi, Xiahuan Cai, Hongtao Ma, Zhengjie Fu and Yan Zhan

Electronics 2025, 14(15), 3145; https://doi.org/10.3390/electronics14153145 - 7 Aug 2025

Viewed by 894

Abstract

Under the growing demand for processing multimodal and cross-lingual information, traditional retrieval systems have encountered substantial limitations when handling heterogeneous inputs such as images, textual layouts, and multilingual language expressions. To address these challenges, a unified retrieval framework has been proposed, which integrates [...] Read more.

Under the growing demand for processing multimodal and cross-lingual information, traditional retrieval systems have encountered substantial limitations when handling heterogeneous inputs such as images, textual layouts, and multilingual language expressions. To address these challenges, a unified retrieval framework has been proposed, which integrates visual features from images, layout-aware optical character recognition (OCR) text, and bilingual semantic representations in Chinese and English. This framework aims to construct a shared semantic embedding space that mitigates semantic discrepancies across modalities and resolves inconsistencies in cross-lingual mappings. The architecture incorporates three main components: a visual encoder, a structure-aware OCR module, and a multilingual Transformer. Furthermore, a joint contrastive learning loss has been introduced to enhance alignment across both modalities and languages. The proposed method has been evaluated on three core tasks: a single-modality retrieval task from image → OCR, a cross-lingual retrieval task between Chinese and English, and a joint multimodal retrieval task involving image, OCR, and language inputs. Experimental results demonstrate that, in the joint multimodal setting, the proposed model achieved a Precision@10 of 0.693, Recall@10 of 0.684, nDCG@10 of 0.672, and F1@10 of 0.685, substantially outperforming established baselines such as CLIP, LayoutLMv3, and UNITER. Ablation studies revealed that removing either the structure-aware OCR module or the cross-lingual alignment mechanism resulted in a decrease in mean reciprocal rank (MRR) to 0.561, thereby confirming the critical role of these components in reinforcing semantic consistency across modalities. This study highlights the powerful potential of large language models in multimodal semantic fusion and retrieval tasks, providing robust solutions for large-scale semantic understanding and application scenarios in multilingual and multimodal contexts. Full article

(This article belongs to the Section Artificial Intelligence)

► Show Figures

Figure 1

Search Results (129)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (129)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI