Next Article in Journal
Review on Engineering Application Status of Gob-Side Entry Retaining Technology in China
Next Article in Special Issue
Bayesian Optimization for Instruction Generation
Previous Article in Journal
Multi-Layer QCA Reversible Full Adder-Subtractor Using Reversible Gates for Reliable Information Transfer and Minimal Power Dissipation on Universal Quantum Computer
Previous Article in Special Issue
WorkloadGPT: A Large Language Model Approach to Real-Time Detection of Pilot Workload
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Extracting Sentence Embeddings from Pretrained Transformer Models

by
Lukas Stankevičius
* and
Mantas Lukoševičius
Faculty of Informatics, Kaunas University of Technology, LT-51368 Kaunas, Lithuania
*
Author to whom correspondence should be addressed.
Appl. Sci. 2024, 14(19), 8887; https://doi.org/10.3390/app14198887
Submission received: 14 August 2024 / Revised: 18 September 2024 / Accepted: 23 September 2024 / Published: 2 October 2024

Abstract

Pre-trained transformer models shine in many natural language processing tasks and therefore are expected to bear the representation of the input sentence or text meaning. These sentence-level embeddings are also important in retrieval-augmented generation. But do commonly used plain averaging or prompt templates sufficiently capture and represent the underlying meaning? After providing a comprehensive review of existing sentence embedding extraction and refinement methods, we thoroughly test different combinations and our original extensions of the most promising ones on pretrained models. Namely, given 110 M parameters, BERT’s hidden representations from multiple layers, and many tokens, we try diverse ways to extract optimal sentence embeddings. We test various token aggregation and representation post-processing techniques. We also test multiple ways of using a general Wikitext dataset to complement BERT’s sentence embeddings. All methods are tested on eight Semantic Textual Similarity (STS), six short text clustering, and twelve classification tasks. We also evaluate our representation-shaping techniques on other static models, including random token representations. Proposed representation extraction methods improve the performance on STS and clustering tasks for all models considered. Very high improvements for static token-based models, especially random embeddings for STS tasks, almost reach the performance of BERT-derived representations. Our work shows that the representation-shaping techniques significantly improve sentence embeddings extracted from BERT-based and simple baseline models.
Keywords: BERT; embeddings; large language models; natural language processing; text embeddings; sentence vector representation; semantic similarity; transformer models; prompt engineering; unsupervised learning BERT; embeddings; large language models; natural language processing; text embeddings; sentence vector representation; semantic similarity; transformer models; prompt engineering; unsupervised learning

Share and Cite

MDPI and ACS Style

Stankevičius, L.; Lukoševičius, M. Extracting Sentence Embeddings from Pretrained Transformer Models. Appl. Sci. 2024, 14, 8887. https://doi.org/10.3390/app14198887

AMA Style

Stankevičius L, Lukoševičius M. Extracting Sentence Embeddings from Pretrained Transformer Models. Applied Sciences. 2024; 14(19):8887. https://doi.org/10.3390/app14198887

Chicago/Turabian Style

Stankevičius, Lukas, and Mantas Lukoševičius. 2024. "Extracting Sentence Embeddings from Pretrained Transformer Models" Applied Sciences 14, no. 19: 8887. https://doi.org/10.3390/app14198887

APA Style

Stankevičius, L., & Lukoševičius, M. (2024). Extracting Sentence Embeddings from Pretrained Transformer Models. Applied Sciences, 14(19), 8887. https://doi.org/10.3390/app14198887

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop