LLM-Based Query Expansion with Gaussian Kernel Semantic Enhancement for Dense Retrieval

Pan, Min; Xiong, Wenrui; Zhou, Shuting; Gao, Mengfei; Chen, Jinguang

doi:10.3390/electronics14091744

Open AccessArticle

LLM-Based Query Expansion with Gaussian Kernel Semantic Enhancement for Dense Retrieval

by

Min Pan

¹

,

Wenrui Xiong

¹,

Shuting Zhou

¹,

Mengfei Gao

¹ and

Jinguang Chen

^1,2,*

¹

School of Computer and Information Engineering, Hubei Normal University, Huangshi 435002, China

²

School of Electronic Information, Huzhou College, Huzhou 313000, China

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(9), 1744; https://doi.org/10.3390/electronics14091744

Submission received: 23 March 2025 / Revised: 23 April 2025 / Accepted: 23 April 2025 / Published: 24 April 2025

(This article belongs to the Special Issue Innovative Applications of Large Language Models in Natural Language Processing (NLP))

Download

Browse Figures

Versions Notes

Abstract

:

In the field of Information Retrieval (IR), user-submitted keyword queries often fail to accurately represent users’ true search intent. With the rapid advancement of artificial intelligence, particularly in natural language processing (NLP), query expansion (QE) based on large language models (LLMs) has emerged as a key strategy for improving retrieval effectiveness. However, such methods often introduce query topic drift, which negatively impacts retrieval accuracy and efficiency. To address this issue, this study proposes an LLM-based QE framework that incorporates a Gaussian kernel-enhanced semantic space for dense retrieval. Specifically, the model first employs LLMs to expand the semantic dimensions of the initial query, generating multiple query representations. Then, by introducing a Gaussian kernel semantic space, it captures deep semantic relationships among these query vectors, refining their semantic distribution to better represent the original query’s intent. Finally, the ColBERTv2 model is utilized to retrieve documents based on the enhanced query representations, enabling precise relevance assessment and improving retrieval performance. To validate the effectiveness of the proposed approach, extensive empirical evaluations were conducted on the MS MARCO passage ranking dataset. The model was systematically assessed using key metrics, including MAP, NDCG@10, MRR@10, and Recall@1000. Experimental results demonstrate that the proposed method outperforms existing approaches across multiple metrics, significantly improving retrieval precision while effectively mitigating query drift, offering a novel approach for building efficient QE mechanisms.

Keywords:

information retrieval; query expansion; large language models; gaussian kernel; dense retrieval

1. Introduction

In the field of IR, the core task is to efficiently and accurately mine information that aligns with user needs from vast document collections. However, achieving this goal in practical applications presents several challenges. One key challenge is that users often tend to use short and direct query statements, which makes it difficult for traditional retrieval models to fully capture the user’s true intent.

Pseudo-Relevance Feedback (PRF) [1,2] methods, through automatic QE, extract keywords from feedback documents to serve as expansion terms, effectively alleviating issues related to word matching. These methods have been shown to play a significant role in sparse retrieval models [3,4,5,6,7]. However, traditional PRF approaches primarily rely on statistical frequency or the occurrence count of terms within documents to select expansion terms, overlooking the consideration of semantic information. This limitation restricts their performance in complex retrieval tasks.

In recent years, the rise of LLMs and dense retrieval models has provided new solutions to these challenges. LLMs, leveraging their vast knowledge base and context-aware capabilities, can generate multiple optimized queries, modeling the importance of terms and enhancing retrieval performance. At the same time, dense retrieval models (such as ColBERT [8], ANCE [9], ColBERTv2 [10], ColBERT-PRF [11], etc.) utilize pre-trained language models (e.g., BERT [12]) to capture deep semantic relationships between queries and documents. By mapping both queries and documents into the same high-dimensional vector space, these models enable a precise understanding of complex query requirements. Dense retrieval models are also proficient in utilizing contextual information, which allows them to perform exceptionally well in handling long documents and multi-modal information retrieval tasks.

One current research focus involves leveraging LLMs for QE. These methods improve retrieval performance by analyzing the original query and generating synonym expansions, hypernym/hyponym expansions, and semantically related query variants. However, Breuer T [13] found that generating multiple query variants with LLMs may lead to query drift, resulting in the loss of some semantic information, which in turn affects the comprehensiveness and accuracy of retrieval results.

To address this issue, inspired by Pan et al. [5], who utilized kernel functions to handle word co-occurrence frequencies and long document processing, we propose a Large Language Model-based QE and Gaussian Kernel Semantic-Enhanced Dense Retrieval Model (LSDR_Gs). The model combines optimized queries generated by LLMs with a Gaussian kernel semantic space to capture deep semantic relationships between queries. It further integrates the semantic distribution of query relevance, thereby enhancing the semantic consistency between the optimized and original queries and improving the comprehensiveness and precision of retrieval performance.

Our work is driven by the following key objectives:

(1) Leveraging LLMs for Query Expansion: We systematically investigate how LLMs utilize their extensive knowledge bases and contextual understanding to generate multiple optimized queries. This process enriches the original query and enhances retrieval performance, particularly in complex or ambiguous query scenarios.

(2) Developing a Dense Retrieval Model Enhanced with Gaussian Kernel Model: We design and implement a dense retrieval model that incorporates Gaussian kernel functions to capture deep semantic relationships, mitigating query drift and improving QE effectiveness.

(3) Evaluating the Effectiveness of LSDR_Gs: We conduct comprehensive experiments to assess the performance of the LSDR_Gs model across different datasets, validating the advantages of LSDR_Gs in improving retrieval precision and relevance.

The main contributions of this paper are as follows:

(1) Integrating LLMs with Dense Retrieval Technology: This approach combines the knowledge base and context-aware capabilities of LLMs with the deep learning techniques of dense retrieval models. By generating multiple optimized queries, it enriches the initial query and enhances retrieval performance. While the process of query generation involves certain computational resources, the method’s effectiveness in improving retrieval quality makes it a valuable extension to traditional retrieval systems.

(2) Introducing a Novel Query Enhancement Method—the LSDR_Gs Model, which utilizes Gaussian kernel functions to construct a semantic space, effectively captures deep semantic relationships between the original and optimized queries, addressing issues such as semantic drift and information loss that may arise during the query generation process.

(3) Practical Deployment of the Model: In the practical deployment of the model, this method only extends the query without requiring task-specific fine-tuning of the LLMs, thus avoiding redundant document computations. This strategy significantly reduces inference time and improves system efficiency, providing a more effective solution for large-scale information retrieval tasks.

2. Related Work

2.1. Dense Retrieval

Dense retrieval models differ from traditional BERT-based re-rankers using “cross-encoders” [14,15,16] in that they typically adopt a BERT-based “dual encoder” architecture, offering significant advantages in retrieval efficiency and scalability. In a dual encoder architecture, queries and documents are encoded separately into dense vector representations, enabling the efficient use of vector search algorithms during retrieval. Dense retrieval models are generally classified into two categories: single-representation dense retrieval models and multi-representation dense retrieval models [17]. In particular, within the single-representation dense retrieval paradigm, models such as DPR [18] and ANCE [9] encode each query or document as a single dense vector representation. This benefit is attributed to the availability of pre-computed document representations; single-representation models can quickly locate relevant documents via efficient nearest-neighbor search (e.g., retrieval frameworks based on vector indexing technologies). This approach offers significant advantages in retrieval speed but may have limitations in capturing complex semantic relationships due to its reliance on a single vector representation.

In contrast, multi-representation dense retrieval models differ from single-representation models by encoding each token within the query and document as separate dense vectors, enabling the capture of finer-grained semantic information. For example, ColBERT [8] performs an approximate nearest-neighbor search on each embedding in the query and document, followed by precise scoring to achieve efficient and high-precision retrieval performance. This “late interaction” mechanism effectively balances computational efficiency and semantic capture capability. As an improved version of ColBERT, ColBERTv2 [10] adopts more advanced training methods, including optimized contrastive learning strategies and model fine-tuning techniques, further enhancing retrieval performance. Additionally, ColBERTv2 introduces residual compression technology to significantly reduce storage costs, making it more practical and efficient for large-scale retrieval tasks. Given its strong performance in both semantic expression and system efficiency, we have chosen ColBERTv2 as the foundational retrieval model to validate the effectiveness of the method proposed in this study.

2.2. Query Expansion

Optimal QE [19] is a popular paradigm for improving effectiveness in IR, with methods such as PRF widely used to mitigate vocabulary mismatches by QE. Recent advancements in generative language models have demonstrated their ability to produce relevant responses based on given prompts. QE is a widely adopted technique in IR applications [20], where the original query is expanded by adding additional context to match target documents. Early studies used the initially retrieved documents as PRF [2,3,4,5,6,7,21], extracting relevant content as supplementary information. However, the effectiveness of these methods is limited by the quality of the initial retrieval.

2.3. Large Language Models

Recently, advancements in LLMs and prompt engineering [22,23], such as LLAMA [24], have made significant progress. LLM-enhanced information retrieval [25,26,27,28,29] has become a prominent area of research, where LLMs are used to generate QE by leveraging their inherent knowledge. For example, HyDE [30] uses LLMs to directly generate hypothetical documents answering the query, which are then used to retrieve similar actual documents through their embeddings. Query2Doc [31] improves the quality of QE by providing LLMs with a few examples. Jagerman et al. [32] also explored chain-of-thought as a method for QE. To address the potential lack of domain-specific knowledge in LLMs, Shen et al. [33] proposed a retrieval-enhanced method that generates QE using LLMs and fine-tunes them using pre-trained domain-specific models. Breuer T [13] highlighted that excessive QE could lead to a decrease in retrieval effectiveness for certain topics and pointed out the potential topic drift caused by synthetic queries. Therefore, this paper primarily focuses on addressing the issue of potential topic drift in QE.

2.4. Kernel Function

In early studies, De Kretser and Moffat [34] proposed a locality-based similarity measure that utilized four contribution functions (i.e., triangle, cosine, circular, and arc functions) to evaluate the similarity between each query term and other positions. Subsequently, some kernel functions were employed to estimate the influence of query term occurrences [35]. Specifically, kernels that satisfy certain properties, such as Gaussian, triangular, cosine, circular, quartic, Epanechnikov, and Triweight functions, were introduced. These studies proposed that when two query terms are closer, they have higher co-occurrence values. Based on this theory, Pan et al. [5] introduced a kernel co-occurrence framework that uses kernel functions to capture the relationships between query terms and expanded terms.

Inspired by this line of work, we propose a dense retrieval method that combines QE based on LLMs and Gaussian kernel-based semantic enhancement. In the semantic vector space, we leverage the Gaussian semantic space to pull together semantically similar optimized queries and push away queries with potential topic drift, effectively addressing the issue of latent topic shift.

3. Proposed Method

In this section, we present the innovative IR method proposed in this study, which integrates LLMs (such as LLAMA 3 8B [36]), Gaussian kernel functions, and traditional dense retrieval techniques. The core advantage of this approach lies in its ability to leverage the semantic understanding of queries provided by LLMs while effectively addressing the common issue of topic drift during multi-query retrieval by incorporating the Gaussian kernel function. This method enables the construction of optimized queries with multi-dimensional and rich semantics, further enhancing retrieval performance. Experimental results show that, when tested on two TREC datasets, the proposed method demonstrates significant performance improvements across various evaluation metrics.

Figure 1 illustrates the overall workflow of integrating LLMs and kernel-based semantic strategies into the ColBERTv2 retrieval framework. The process consists of the following key stages:

(1) Query Expansion: First, LLAMA 3 8B is used to expand the original query by generating a series of optimized queries with supplemental semantics. These expanded queries aim to capture multiple facets of the original query’s meaning.

(2) Query Encoding: Using the Query Encoder module built into the ColBERTv2 framework, all the generated optimized queries are transformed into their corresponding vector representations, resulting in query vectors.

(3) Similarity Computation: The Euclidean distance between the original query and its optimized queries’ vector representations is computed. This step quantifies the similarity between the queries in high-dimensional space, helping assess the level of alignment between the expanded queries and the original intent. Unlike cosine similarity, which focuses on vector directionality, Euclidean distance captures the absolute differences between the queries, which is more effective in this context for evaluating the semantic shifts during query expansion.

(4) Kernel-based Weighting: The computed distances are then mapped to a kernel function, which is used to explore deeper semantic relationships between the queries. This approach captures more complex inter-query relationships and enhances the semantic consistency between the generated and original queries, thereby improving the quality and accuracy of the retrieval results.

(5) Retrieval and Ranking: The weighted query vectors are then used by ColBERTv2’s late interaction mechanism to retrieve and rank relevant documents from the corpus. The kernel-based weighting enhances both semantic fidelity and retrieval accuracy.

3.1. LLM-Based Query Expansion

The process begins by providing the initial query

Q_{0}

to a LLAMA 3 8B. Based on fine-tuned instructions, the LLM then generates a series of

n

independent rewritten versions, aiming to enhance the semantic depth and breadth of the original query through diverse formulations, ultimately forming a set of optimized queries. The method for generating optimized queries is given by Equation (1):

(Q_{1}, Q_{2}, \dots Q_{n}) = L L M_{P r o m p t (Q_{0}, γ)},

(1)

where

L L M_{P r o m p t (Q_{0}, γ)}

refers to the process of inputting the original query

Q_{0}

into the LLAMA 3 8Bmodel in order to generate a series of optimized queries

Q_{1}, Q_{2}, \dots Q_{n}

. Here,

γ

represents the prompt specifically set for the task, determining the number of optimized queries

n

to be generated, thereby enriching the semantic coverage of the original query. Each optimized query

Q_{i}

is a full-text natural language sentence. We adopt this formulation because our downstream retrieval model ColBERTv2 requires each query input to be in natural language format for effective contextualized embedding.

3.2. ColBERTv2 Encoding

After obtaining the optimized query set, we use the pre-trained ColBERTv2 query and document encoders to encode the queries and documents separately. The ColBERTv2 query and document encoders share weights but are differentiated by the special prefix tokens

[Q]

and

[D]

. For an input query

Q

, as shown in Equation (2), the query encoder encodes it into a query embedding list with dimension m. If the original query is shorter than 32 tokens, the “[MASK]” embedding is used to pad the input query to a length of 32. For a document

D

, as shown in Equation (3), the document encoder encodes it into a document embedding list with dimension m, where

|d|

represents the length of the document.

ϕ_{Q} = C o l B E R T v 2 ([C L S], [Q], q_{1}, q_{2}, \dots q_{|Q|}) \in R^{32 \times m}

(2)

ϕ_{D} = C o l B E R T v 2 ([C L S], [D], d_{1}, d_{2}, \dots d_{|D|}) \in R^{|D| \times m}

(3)

where

ϕ_{Q}

represents the query vector with a dimension of

32 \times m

(

R^{32 \times m}

), and

ϕ_{D}

represents the document vector with a dimension of

|d| \times m (R^{|d| \times m}) .

The tokens

[C L S], [Q], a n d [D]

are special prefix tokens.

q_{1}, q_{2}, \dots q_{|Q|}

represent the constituent words of query

Q

, and, similarly,

d_{1}, d_{2}, \dots d_{|D|}

represent the constituent words of document

D

.

|Q|

refers to the number of words in the query, and

|D|

refers to the number of words in the document.

3.3. Gaussian Kernel Semantic Space

We acknowledge that, although the original query

Q_{0}

best reflects the user’s initial intent, some of the optimized queries generated through QE techniques may exhibit a certain degree of semantic shift. To address this issue, this study introduces the concept of a Gaussian kernel semantic space.

In the encoding process described in Section 3.2, we convert the QE set into a set of query vectors. Subsequently, by applying Euclidean distance dimensionality reduction as shown in Equation (4), combined with the Gaussian kernel function as presented in Equation (5), we can effectively quantify the similarity of these vectors in high-dimensional space. To provide an intuitive understanding of the above mathematical formulations, Figure 2 offers a graphical illustration that corresponds to Equations (4) and (5). Moreover, to ensure the validity of our proposed LSDR_Gs model, we employ an additional data fusion method, as shown in Equation (6), as a comparative experiment (LSDR_D). Specifically, to evaluate whether the Gaussian kernel semantic enhancement method addresses the query drift problem, we provide a simple descent function [37] for comparison, which we denote as LSDR_D, as described in Equation (6).

L_{2} (Q_{i}) = | | ϕ_{Q_{i}} - ϕ_{Q_{0}} | |_{2} = \sqrt[2]{\sum_{j = 1}^{L} {(ϕ_{Q_{i_{j}}} - ϕ_{Q_{0_{j}}})}^{2}},

(4)

K_{G s} (u) = \exp [\frac{- u^{2}}{2 σ^{2}}],

(5)

K_{D} (u) = \{\begin{matrix} 1, u < 1 \\ \frac{1}{u}, u \geq 1 \end{matrix},

(6)

where

L_{2} (Q_{i})

represents the Euclidean distance between

Q_{i}

and the original query

Q_{0}

, and

K e r n e l (u)

is the kernel function, with

u

denoting the distance. In this paper, Euclidean distance is used, and

L

refers to the embedding length of the query, which does not exceed 32.

σ

is a hyperparameter used to balance the semantic relationship between the original query and the optimized queries.

3.4. Retrieval

Here, we use the method proposed by ColBERTv2. Based on the obtained query embedding

ϕ_{Q}

and document embedding

ϕ_{D}

, the final similarity score

s (Q, D)

between the query and the document is calculated as the sum of the highest cosine similarity for each query embedding corresponding to the document embeddings, as shown in Equation (7):

s (Q, D) = \sum_{ϕ_{q_{i}} \in ϕ_{Q}} \underset{ϕ_{d_{j}} \in ϕ_{D}}{m a x} ϕ_{q_{i}} \cdot ϕ_{d_{j}}^{T},

(7)

where

m a x

represents the highest cosine similarity.

ϕ_{q_{i}}

denotes an embedding of a query term within the query

Q

, and, similarly,

ϕ_{d_{j}}

denotes an embedding of a document term within the document

D

.

3.5. Semantic-Enhanced Dense Retrieval

To address the issues of query and topic drift, we calculated the distance relationships between the original query embedding and the optimized query embeddings using ColBERTv2. Additionally, we introduced a Gaussian kernel function to balance the semantic relationship between the original and optimized queries. Subsequently, the ColBERTv2 model was used to perform dense retrieval. The specific computation for our proposed method,

{L S D R}_{G S} (Q_{0}, D)

, is given by Equation (8). The specific computation for the comparison experiment,

{L S D R}_{D} (Q_{0}, D)

, is provided in Equation (9).

{L S D R}_{G s} (Q_{0}, D) = \sum_{i = 0}^{n} K_{G s} (L_{2} (Q_{i})) * s (Q_{i}, D),

(8)

{L S D R}_{D} (Q_{0}, D) = \sum_{i = 0}^{n} K_{D} (L_{2} (Q_{i})) * s (Q_{i}, D) .

(9)

4. Experimental Settings

4.1. Selection of the Large Language Model

In this study, we selected the LLAMA 3 8B model due to its optimal balance between computational efficiency and retrieval performance. While larger models, such as GPT-4, GPT-3, GPT-3.5 [38], and Mistral 7B [39], offer more advanced capabilities, the increased number of parameters significantly impacts inference speed and computational cost. For tasks requiring high retrieval efficiency, the added complexity of these models can result in delays and excessive resource consumption, as illustrated in Table 1.

The LLAMA 3 8B, with 8 billion parameters and an 8k token context length, provides sufficient capacity to handle retrieval tasks effectively while maintaining low inference costs and computational demands. This makes it an ideal choice for our study, where the goal is to maximize retrieval performance without sacrificing efficiency.

While models with larger parameter sizes, such as GPT-4 (170 billion parameters), offer superior performance, they come with a significant trade-off in terms of resource consumption. In contrast, LLAMA 3 8B achieves a strong balance between performance and resource efficiency, ensuring it can handle complex queries while remaining within a reasonable computational budget. This efficiency is essential for the scalability of the retrieval tasks addressed in our study, making LLAMA 3 8B a highly suitable choice for our experiments.

4.2. Datasets and Evaluation Metrics

In the experimental setup, the datasets used are the TREC Deep Learning 2019 [40] (abbreviated as TREC DL 2019) and 2020 [41] (abbreviated as TREC DL 2020) passage retrieval datasets. The TREC DL 2019 test set includes 43 queries, and the TREC DL 2020 passage retrieval test set includes 54 queries. The relevance judgments for both datasets are rated on a scale from 0 (irrelevant) to 3 (highly relevant). During the evaluation phase, we followed the official evaluation standards of each track and reported the key performance metrics on the TREC 2019 and TREC 2020 query sets. These metrics include Mean Reciprocal Rank at 10 (MRR@10), Recall at 1000 (Recall@1000), Normalized Discounted Cumulative Gain at 10 (NDCG@10), and Mean Average Precision (MAP). To ensure consistency and rigor in the evaluation, we adopted the same approach as in previous studies [11], where document paragraphs with a relevance label of 1 are considered irrelevant.

It is important to note that both TREC DL 2019 and TREC DL 2020 are built upon the MS MARCO passage ranking dataset, where the queries are collected from real-world search logs and formulated in natural language. As such, they can be categorized as fully-semantic queries, which do not include explicit logical operators such as AND, OR, or NOT.

4.3. Hyperparameter Settings

In this paper, we selected the ColBERTv2 pre-trained model as the dense retrieval backbone and followed the configuration standards reported in the original implementation [10]. Specifically, we utilized the checkpoint at training step 150,000 from the official release and retained the architectural settings defined in the model’s configuration file. The key hyperparameters of this configuration are 12 transformer layers, each with 12 attention heads, a hidden size of 768, an intermediate size of 3072, and dropout rates for attention and hidden layers both set to 0.1. The model uses GELU activation and absolute positional embeddings, and supports a maximum input length of 512 tokens. These settings are aligned with BERT-base and ensure compatibility with ColBERT’s late interaction design.

To handle tasks that involve complex and detailed text generation, particularly in scenarios requiring precise query rewriting, we introduce the LLAMA-3-8B. For this model, we specifically adjusted the maximum sequence length (Max Length) to 2096. As shown in Figure 3, this is the prompt

γ

for using a large language model, where we refer to Ivica Kostric’s research [42] to set n to 10. The prompt template was carefully designed and iteratively refined through a multi-stage prompt engineering process, involving comparative testing of different instruction phrasings and output formats. We evaluated various prompt variants in terms of semantic fidelity, structural consistency, and retrieval effectiveness, and selected the most robust prompt that consistently guided the model to generate semantically enriched queries.

To ensure diversity and controllability in the generated text, we set the temperature parameter to a very low value of 0.001. This configuration minimizes randomness in the generated content, ensuring that the output remains stable and closely aligned with the input context. Additionally, recognizing the differences between various documents and datasets, we acknowledge that static smoothing parameters (such as commonly used fixed values like {1, 10, 100}) may not be the most suitable for all cases. Therefore, we adopted a dynamic adjustment strategy, setting the range for the smoothing parameter

σ

between 0.1

μ

and

μ

, with a step size of 0.1

μ

, where

μ

represents the maximum Euclidean distance.

5. Experimental Results Analysis

5.1. Comparison with the Sparse Models

To validate the effectiveness of the proposed method, this paper designs a series of comparative experiments, comparing it with various sparse models. The goal is to comprehensively evaluate the performance of LSDR_Gs across different scenarios. Specifically, we have selected the following representative models as baseline comparisons:

(a): BM25 [43]: A classical sparse retrieval model, widely used in information retrieval tasks, known for its simplicity and efficiency.
(b): BM25 + RM3: A QE method based on language models. It generates new queries by extracting high-probability terms from pseudo-relevant documents and generating new queries based on the probability distribution of these terms.
(c): BM25 + Rocchio: A QE method based on the vector space model (VSM). It generates new query representations by adjusting the direction and length of the query vector, combining features from the initial query and pseudo-relevant documents.
(d): BM25 + BERT: Combines the preliminary retrieval results of BM25 with the re-ranking capability of BERT to improve retrieval accuracy.
(e): BM25 + ColBERT [11]: Based on the preliminary retrieval of BM25, it further optimizes the ranking using the ColBERT model.

We conducted a comprehensive comparison of the LSDR_Gs model with classical sparse retrieval models using the official TREC metrics, including MAP, NDCG@10, MRR@10, and Recall@1000. Using BM25 as the baseline model, we further quantified the performance improvement of other models relative to BM25. The experimental results in Table 2 clearly demonstrate that re-ranking methods based on pre-trained language models exhibit significant advantages across several key metrics.

The traditional BM25 sparse retrieval model efficiently ranks relevant documents through exact keyword matching between queries and documents. This approach has the advantage of speed and efficiency when applied to large document collections. However, its reliance on lexical matching limits its ability to understand semantics. For example, BM25 struggles to capture synonyms or contextual semantic relationships, which constrains the comprehensiveness and accuracy of the retrieval results. Furthermore, when dealing with long-tail queries or low-frequency terms, the performance of sparse models can degrade significantly, making it difficult to effectively bridge the semantic gap between queries and documents. Our proposed model provides an effective solution to these challenges.

As shown in Table 2, compared to the baseline model BM25, both LSDR_D and LSDR_Gs achieved significant improvements across various metrics on the TREC 2019 and TREC 2020 datasets. In particular, for the LSDR_Gs model, the improvements in MAP, NDCG@10, and Recall@1000 were 83.41%, 52.65%, and 19.86% (TREC 2019), and 87.96%, 57.71%, and 15.77% (TREC 2020), respectively. Additionally, LSDR_D and LSDR_Gs outperformed other re-ranking models in MRR@10. Notably, while LSDR_D performed well across all metrics, it did not achieve the same overall performance as LSDR_Gs, further validating the superiority of the Gaussian kernel function in capturing deeper semantic information.

In summary, the LSDR_Gs model not only significantly improves the precision of retrieval results but also demonstrates strong capabilities in handling complex semantic queries. By combining the query rewriting power of LLMs with Gaussian kernel semantic enhancement techniques, LSDR_Gs achieves efficient semantic retrieval in dense vector space. This approach effectively retains the semantic dimensions of the original user query, expands the query’s semantics using LLMs, and mitigates query drift by introducing the Gaussian kernel function, resulting in a significant enhancement in retrieval performance.

5.2. Comparison with Dense Retrieval Models

To validate the effectiveness of the proposed method, we designed a series of comparative experiments to compare LSDR_Gs with various dense retrieval models. The goal is to comprehensively evaluate the performance of LSDR_Gs across different scenarios. Specifically, we selected the following representative models as baselines for comparison:

(a): ColBERT E2E [8]: The end-to-end dense retrieval version of ColBERT, which directly generates query and document representations from raw text, avoiding the limitations of traditional two-stage retrieval.
(b): ANCE [9]: A dense retrieval model with a single representation that optimizes query and document representations through adaptive negative sample selection.
(c): SBERT: Sentence-BERT (SBERT) [44,45] enhances BERT-based models by generating semantically rich sentence embeddings using a Siamese network architecture.
(d): uniCOIL [43]: A framework combining the pre-trained language model (doc2query-T5) and sparse representations, capable of capturing both lexical and semantic relevance.
(e): ColBERTv2 [10]: An improved version of the ColBERT model, which incorporates more advanced training strategies and model architecture, further enhancing retrieval performance.
(f): ANCE-PRF [44]: ANCE-PRF is a method that combines PRF with the dense retrieval model ANCE. By leveraging the powerful semantic understanding and generation capabilities of pre-trained language models, ANCE-PRF improves upon traditional PRF methods, demonstrating higher accuracy and robustness when handling ambiguous or complex queries.
(g): DistilBERT Balanced Average [44]: A retrieval method based on the lightweight DistilBERT model that improves retrieval stability and accuracy by balancing the importance of different features and performing weighted averaging on multiple embedding vectors.
(h): DistilBERT Balanced Rocchio [44]: This method combines DistilBERT with the classic Rocchio algorithm, dynamically adjusting the weights of query vectors to optimize query representation.
(i): CWPRF-AAAT [46]: Contextualized Word Pseudo-Relevance Feedback with Adaptive Attention and Transformation (CWPRF-AAAT) is a method that combines contextual awareness, PRF, adaptive attention mechanisms, and transformation techniques.
(j): CWPRF-OAAT [46]: An improved version of CWPRF-AAAT, emphasizing optimal attention mechanisms and global optimization transformation techniques.
(k): ColBERT-PRF Ranker [11]: A ranking method that combines the ColBERT model with PRF.

We conducted a comprehensive comparison of the LSDR_Gs model against classic dense retrieval models based on TREC official evaluation metrics, including MAP, NDCG@10, MRR@10, and Recall@1000. Using ColBERT E2E as the baseline model, we further quantified the performance improvements of other models relative to ColBERT E2E. Analyzing the results in Table 3, we observed a significant trend: the ColBERTv2 dense retrieval method demonstrated superior performance compared to other dense retrieval methods like ANCE and ColBERT, mainly due to its stronger semantic capture ability.

Our proposed LSDR_Gs model leverages this powerful semantic capture capability and combines it with large language models and Gaussian kernel functions. The experimental results show that, compared to the baseline model, LSDR_Gs and LSDR_D achieve significant improvements across all metrics on the TREC 2019 and TREC 2020 datasets. Specifically, for the LSDR_Gs model, improvements in MAP, NDCG@10, MRR@10, and Recall@1000 reached 27.98%, 11.35%, 9.98%, and 13.93% (TREC 2019), and 15.34%, 10.09%, 1.38%, and 10.41% (TREC 2020), respectively.

Compared to other pseudo-relevance feedback dense retrieval models (e), (f), (g), (i), and (j), our proposed LSDR_Gs and LSDR_D models also showed significant improvements across all metrics. Notably, since our model operates with only a single round of retrieval, both time efficiency and retrieval accuracy have been enhanced. The main reason for this phenomenon lies in the fact that we enhance the semantic dimensions of the original query through LLMs, providing semantic expansion, and combine this enriched semantic information with a Gaussian kernel function. This approach effectively avoids the loss of semantic information. The enhanced query representation not only improves retrieval recall but also significantly boosts the relevance of the retrieval results.

The enhanced query representation retains the semantic intent of the original query while integrating more relevant background knowledge and potential semantics, making the retrieval results more aligned with user needs. Moreover, this method effectively addresses the potential semantic loss and query drift that might occur when generating queries with LLMs, further improving the accuracy and effectiveness of the retrieval.

5.3. Parameter Sensitivity Analysis

To evaluate the robustness of the proposed model, we conducted an in-depth analysis of the key factors influencing its performance. This section specifically focuses on the impact of the hyperparameter

σ

on the model’s performance, and, in the experimental setup, we set

σ

to vary dynamically between 0.1

μ

and

μ

, with a step size of 0.1

μ

. Figure 4 presents the detailed evaluation results of the proposed method on the TREC 2019 and TREC 2020 datasets, covering four key metrics: MAP, MRR@10, Recall@1000, and NDCG@10.

Overall, all metrics exhibit a rise-then-fall trend across both datasets. This trend indicates that our model is effective in capturing high-dimensional semantic information, but, when the value of

σ

becomes too large, it may lead to query drift, thus affecting the model’s performance. As the value of

σ

increases, MAP first rises and then declines, with the optimal value of

σ

around 0.7

μ

. This suggests that, at this

σ

value, the model can more accurately capture the relevance of documents and effectively mitigate query drift. The trend of MRR@10 is similar to MAP, with the optimal

σ

value also around 0.7

μ

. This indicates that, at this

σ

value, the model is able to rank relevant documents more accurately, enhancing the user’s retrieval experience. Recall@1000 reaches its highest value when σ is around 0.8

μ

. This shows that, at this

σ

value, the model is more capable of recalling relevant documents comprehensively, especially in large-scale document collection scenarios. NDCG@10 performs best when

σ

is around 0.6μ. This implies that, at this

σ

value, the model can more effectively balance document relevance and ranking order, ensuring that highly relevant documents are ranked higher.

In conclusion, the hyperparameter

σ

has a significant impact on the performance of the LSDR_Gs method, with slight variations in the optimal

σ

value across different evaluation metrics. Based on the analysis above, we recommend setting

σ

within the range from 0.6

μ

to 0.8

μ

for practical applications to ensure optimal performance across multiple metrics.

5.4. Discussion and Limitations

Building upon the significant performance improvements achieved by the proposed LSDR_Gs method, we further provide an in-depth discussion from two perspectives: comparative analysis and limitations.

First, based on the experimental results, LSDR_Gs consistently outperforms both classic sparse retrieval models (e.g., BM25, BM25 + RM3, BM25 + Rocchio) and dense retrieval models (e.g., ColBERT, ANCE, CWPRF) on the TREC 2019 and TREC 2020 benchmark datasets. Notably, our method achieves superior results across four key evaluation metrics: MAP, NDCG@10, MRR@10, and Recall@1000. These results demonstrate that our approach, which integrates LLMs for query expansion and employs Gaussian kernels for semantic enhancement, is capable of more effectively extracting semantic information from queries. Furthermore, LSDR_Gs significantly improves both the robustness and accuracy of the retrieval system. Compared to using BM25 or traditional PRF methods alone, LSDR_Gs shows stronger capabilities in handling challenging retrieval scenarios, such as long-tail queries, complex semantic expressions, and low-frequency terms.

Importantly, the types of queries considered in our experiments are predominantly fully semantic in nature, often comprising natural language questions or short keyword-based intents without formal logical structures or connectives. This aligns with the characteristics of the TREC Deep Learning Track query sets, where logical composition (e.g., Boolean operators or nested expressions) is generally absent. Thus, although vector-based semantic models (such as ColBERT and LLM-based expansions) are known to struggle with logical reasoning, their semantic approximation remains suitable and effective for the query types we focus on. In this context, the superiority of LSDRGs can be attributed to their ability to capture nuanced semantic features rather than formal logical relationships. When compared to existing PRF-enhanced dense retrieval models (e.g., CWPRF-OAAT and ColBERT-PRF Ranker), LSDR_Gs demonstrates superior ability in capturing deep semantic features, owing to the high-quality query expansion texts generated by LLMs.

The proposed method also presents several notable limitations. LLMs may introduce content that is not entirely aligned with the user’s original intent during query expansion. Although we mitigate the query drift issue through the use of a Gaussian kernel function, it remains fundamentally difficult to fully eliminate deviations from the intended retrieval target. Moreover, as LLMs are large-scale pretrained models, their output quality is highly dependent on the coverage and quality of the training data. When faced with queries that are highly domain-specific or expressed in extremely sparse language, the generation quality may degrade, potentially reducing the generalizability of the expanded queries. Second, LSDR_Gs relies on the use of large pretrained models during both training and inference phases, resulting in higher computational and time costs. This poses challenges for deployment in resource-constrained environments. Finally, while the incorporation of Gaussian kernels improves the precision of semantic matching to a certain extent, the method is highly sensitive to the hyperparameters of the kernel function. Improper parameter settings may diminish the benefits gained from semantic enhancement.

6. Conclusions and Future Work

This paper proposes LSDR_Gs, a dense retrieval model that integrates LLM-based query expansion with Gaussian kernel semantic enhancement, aiming to alleviate the query drift problem and improve retrieval accuracy and robustness. By leveraging the generative capabilities of LLMs and the fine-grained semantic representation power of the Gaussian kernel, the model enhances the semantic expressiveness of queries while maintaining their alignment with the user’s original intent.

Extensive experiments conducted on the TREC 2019 and TREC 2020 datasets demonstrate the superiority of LSDR_Gs over both sparse and dense baselines. Compared with the classical sparse retrieval model BM25, LSDR_Gs achieves improvements of 83.41% MAP, 52.65% NDCG@10, and 19.86% Recall@1000 on TREC 2019, and 87.96% MAP, 57.71% NDCG@10, and 15.77% Recall@1000 on TREC 2020. Furthermore, LSDR_Gs outperforms recent dense retrieval models, including ColBERTv2, CWPRF-OAAT, and ColBERT-PRF Ranker. For instance, compared to ColBERT, LSDR_Gs achieves relative improvements of 27.98 MAP and 13.93% Recall@1000 on TREC 2019, highlighting its effectiveness in capturing deeper semantic relationships.

These results validate that our method not only significantly enhances retrieval performance across multiple metrics but also maintains strong generalizability across different retrieval scenarios. By alleviating query drift and leveraging the Gaussian kernel semantic space, we effectively quantify the similarity between the original query and its optimized queries, ensuring that optimized queries remain aligned with the user’s true intent, thus improving the relevance and accuracy of retrieval results.

Although the LSDR_Gs model has demonstrated excellent performance in the current experiments, future research can further enhance its applicability by exploring the following directions:

Optimizing small-scale large language models: Our initial exploration revealed that smaller models, such as LLAMA 3.2 1B and LLAMA 3.2 3B, struggled to generate optimized queries effectively, whereas LLAMA 3 8B exhibited sufficient capability for this task. This suggests that model size plays a crucial role in query optimization. To address this, future research can focus on training and refining smaller parameter models to strike a balance between semantic expressiveness and computational efficiency, making large language model-based retrieval systems more accessible and scalable.

Exploring more complex semantic enhancement mechanisms: While the current Gaussian kernel function effectively captures semantic relationships between queries, it may still have limitations when handling more complex semantic structures. Future work could consider incorporating additional types of kernel functions or integrating other deep learning techniques, such as graph neural networks, to better model the semantic relationships between queries and documents.

Expanding to multimodal retrieval: With the increasing prevalence of multimedia content, information retrieval tasks are gradually extending beyond pure text to encompass images, videos, and other modalities. Future work could apply the LSDR_Gs model to multimodal retrieval scenarios, integrating visual and textual information to build more comprehensive and intelligent retrieval systems.

Author Contributions

Conceptualization, M.P. and W.X.; methodology, M.P.; validation, W.X.; investigation, M.G.; resources, J.C. and S.Z.; data curation, J.C.; writing-original draft preparation, M.P.; writing-review and editing, W.X. and S.Z.; visualization, M.G.; supervision, S.Z. and J.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research is supported by the Science and Technology Research Program of Hubei Provincial Department of Education (No. F2023018). This research is also supported by the Natural Sciences and Engineering Research Council (NSERC) of Canada, the York Research Chairs (YRC) program, and an ORF-RE (Ontario Research Fund-Research Excellence) award in the BRAIN Alliance.

Data Availability Statement

Data will be made available on request.

Acknowledgments

We extend our sincere gratitude to Jimmy X. Huang for his invaluable assistance and support throughout the writing and revision process of this paper. We are also greatly appreciate the anonymous reviewers and editors for their valuable review comments that greatly helped to improve the quality of this article.

Conflicts of Interest

The authors declare no conflict of interest.

References

Rocchio, J.J.; Salton, G. Information Search Optimization and Interactive Retrieval Techniques. In AFIPS '65 (Fall, Part I): Proceedings of the Fall Joint Computer Conference, Part I, Las Vegas, NV, USA 30 November–1 December 1965; Association for Computing Machinery: New York, NY, USA, 1965; pp. 293–305. [Google Scholar]
Rocchio, J.J. Relevance Feedback in Information Retrieval. In The Smart Retrieval System: Experiments in Automatic Document Processing; Prentice-Hall: Englewood Cliffs, NJ, USA, 1971; pp. 313–323. [Google Scholar]
Pan, M.; Pei, Q.; Liu, Y.; Li, T.; Huang, E.A.; Wang, J.; Huang, J.X. SPRF: A Semantic Pseudo-Relevance Feedback Enhancement for Information Retrieval via ConceptNet. Knowl.-Based Syst. 2023, 274, 110602. [Google Scholar] [CrossRef]
Pan, M.; Wang, J.; Huang, J.X.; Huang, A.J.; Chen, Q.; Chen, J. A Probabilistic Framework for Integrating Sentence-Level Semantics via BERT into Pseudo-Relevance Feedback. Inf. Process. Manag. 2022, 59, 102734. [Google Scholar] [CrossRef]
Pan, M.; Huang, J.X.; He, T.; Mao, Z.; Ying, Z.; Tu, X. A Simple Kernel Co-Occurrence-Based Enhancement for Pseudo-Relevance Feedback. J Assoc. Inf. Sci. Technol. 2020, 71, 264–281. [Google Scholar] [CrossRef]
Miao, J.; Huang, J.X.; Ye, Z. Proximity-Based Rocchio’s Model for Pseudo Relevance. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, Portland, OR, USA, 12–16 August 2012; pp. 535–544. [Google Scholar]
Pan, M.; Zhou, S.; Chen, J.; Huang, E.A.; Huang, J.X. A Semantic Framework for Enhancing Pseudo-Relevance Feedback with Soft Negative Sampling and Contrastive Learning. Inf. Process. Manag. 2025, 62, 104058. [Google Scholar] [CrossRef]
Khattab, O.; Zaharia, M. Colbert: Efficient and Effective Passage Search via Contextualized Late Interaction over Bert. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Xi’an, China, 25–30 July 2020; pp. 39–48. [Google Scholar]
Xiong, L.; Xiong, C.; Li, Y.; Tang, K.-F.; Liu, J.; Bennett, P.N.; Ahmed, J.; Overwijk, A. Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval. In Proceedings of the International Conference on Learning Representations (ICLR), Virtual Conference, 3–7 May 2021. [Google Scholar]
Santhanam, K.; Khattab, O.; Saad-Falcon, J.; Potts, C.; Zaharia, M. ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Seattle, WA, USA, 10–15 May 2022; pp. 3715–3734. [Google Scholar]
Wang, X.; Macdonald, C.; Tonellotto, N.; Ounis, I. ColBERT-PRF: Semantic Pseudo-Relevance Feedback for Dense Passage and Document Retrieval. ACM Trans. Web 2023, 17, 1–39. [Google Scholar] [CrossRef]
Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. Bert: Pre-Training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 2–7 June 2019; Volume 1, pp. 4171–4186. [Google Scholar]
Breuer, T. Data Fusion of Synthetic Query Variants with Generative Large Language Models. In Proceedings of the SIGIR-AP 2024: Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region, Tokyo, Japan, 9–12 December 2024; Association for Computing Machinery: New York, NY, USA, 2024; pp. 274–279. [Google Scholar]
MacAvaney, S.; Yates, A.; Cohan, A.; Goharian, N. CEDR: Contextualized Embeddings for Document Ranking. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, Paris, France, 21–25 July 2019; pp. 1101–1104. [Google Scholar]
Nogueira, R.; Cho, K. Passage Re-Ranking with BERT. arXiv 2019, arXiv:1901.04085. [Google Scholar]
Kang, J.-W.; Choi, S.-Y. Comparative Investigation of GPT and FinBERT’s Sentiment Analysis Performance in News Across Different Sectors. Electronics 2025, 14, 1090. [Google Scholar] [CrossRef]
Macdonald, C.; Tonellotto, N.; Ounis, I. On Single and Multiple Representations in Dense Passage Retrieval. arXiv 2021, arXiv:2108.06279. [Google Scholar]
Karpukhin, V.; Oguz, B.; Min, S.; Lewis, P.; Wu, L.; Edunov, S.; Chen, D.; Yih, W. Dense Passage Retrieval for Open-Domain Question Answering. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Virtual Conference, 16–20 November 2020; pp. 6769–6781. [Google Scholar]
Kumar, R.; Tripathi, K.N.; Sharma, S.C. Optimal Query Expansion Based on Hybrid Group Mean Enhanced Chimp Optimization Using Iterative Deep Learning. Electronics 2022, 11, 1556. [Google Scholar] [CrossRef]
Azad, H.K.; Deepak, A. Query Expansion Techniques for Information Retrieval: A Survey. Inf. Process. Manag. 2019, 56, 1698–1735. [Google Scholar] [CrossRef]
Ye, Z.; Huang, J.X. A Simple Term Frequency Transformation Model for Effective Pseudo Relevance Feedback. In Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval, Gold Coast, Australia, 6–11 July 2014; pp. 323–332. [Google Scholar]
Ng, K.K.Y.; Matsuba, I.; Zhang, P.C. RAG in Health Care: A Novel Framework for Improving Communication and Decision-Making by Addressing LLM Limitations. NEJM AI 2025, 2, AIra2400380. [Google Scholar] [CrossRef]
Hang, C.N.; Tan, C.W.; Yu, P.D. MCQGen: A Large Language Model-Driven MCQ Generator for Personalized Learning. IEEE Access 2024, 12, 13. [Google Scholar] [CrossRef]
Touvron, H.; Lavril, T.; Izacard, G.; Martinet, X.; Lachaux, M.-A.; Lacroix, T.; Rozière, B.; Goyal, N.; Hambro, E.; Azhar, F.; et al. LLaMA: Open and Efficient Foundation Language Models. arXiv 2023, arXiv:2302.13971. [Google Scholar]
Li, R.; Wang, Y.; Wen, Z.; Cui, M.; Miao, Q. Different Paths to the Same Destination: Diversifying LLMs Generation for Multi-Hop Open-Domain Question Answering. Knowl. Based Syst. 2025, 309, 112789. [Google Scholar] [CrossRef]
Silva, L.; Barbosa, L. Improving Dense Retrieval Models with LLM Augmented Data for Dataset Search. Knowl. Based Syst. 2024, 294, 111740. [Google Scholar] [CrossRef]
Peng, Z.; Wu, X.; Wang, Q.; Fang, Y. Soft Prompt Tuning for Augmenting Dense Retrieval with Large Language Models. Knowl. Based Syst. 2025, 309, 112758. [Google Scholar] [CrossRef]
Ding, R.; Zhou, B. Enhancing Domain-Specific Knowledge Graph Reasoning via Metapath-Based Large Model Prompt Learning. Electronics 2025, 14, 1012. [Google Scholar] [CrossRef]
Bao, X.; Lv, Z.; Wu, B. Enhancing Large Language Models with RAG for Visual Language Navigation in Continuous Environments. Electronics 2025, 14, 909. [Google Scholar] [CrossRef]
Gao, L.; Ma, X.; Lin, J.; Callan, J. Precise Zero-Shot Dense Retrieval without Relevance Labels. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers); Association for Computational Linguistics: Toronto, ON, Canada, 2023; pp. 1762–1777. [Google Scholar]
Wang, L.; Yang, N.; Wei, F. Query2doc: Query Expansion with Large Language Models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Singapore, 6–10 December 2023; pp. 9414–9423. [Google Scholar]
Jagerman, R.; Zhuang, H.; Qin, Z.; Wang, X.; Bendersky, M. Query Expansion by Prompting Large Language Models. arXiv 2023, arXiv:2305.03653. [Google Scholar]
Shen, T.; Long, G.; Geng, X.; Tao, C.; Lei, Y.; Zhou, T.; Blumenstein, M.; Jiang, D. Retrieval-Augmented Retrieval: Large Language Models Are Strong Zero-Shot Retriever. In Proceedings of the Findings of the Association for Computational Linguistics ACL 2024, Bangkok, Thailand, 11–16 August 2024; pp. 15933–15946. [Google Scholar]
de Kretser, O.; Moffat, A. Effective Document Presentation with a Locality-Based Similarity Heuristic. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Berkeley, CA, USA, 15–19 August 1999; pp. 113–120. [Google Scholar]
Lv, Y.; Zhai, C. A Comparative Study of Methods for Estimating Query Language Models with Pseudo Feedback. In Proceedings of the 18th ACM Conference on Information and Knowledge Management; Hong Kong, China, 2–6 November 2009; Association for Computing Machinery: New York, NY, USA, 2009; pp. 1895–1898. [Google Scholar]
AI@Meta Llama 3 Model Card. 2024. Available online: https://ai.meta.com/blog/meta-llama-3/. (accessed on 18 April 2024).
Pan, M.; Li, T.; Liu, Y.; Pei, Q.; Huang, E.A.; Huang, J.X. A Semantically Enhanced Text Retrieval Framework with Abstractive Summarization. Comput. Intell. 2024, 40, e12603. [Google Scholar] [CrossRef]
Massey, P.A.; Montgomery, C.; Zhang, A.S. Comparison of ChatGPT–3.5, ChatGPT-4, and Orthopaedic Resident Performance on Orthopaedic Assessment Examinations. JAAOS-J. Am. Acad. Orthop. Surg. 2023, 31, 1173–1179. [Google Scholar] [CrossRef] [PubMed]
Siino, M.; Tinnirello, I. Prompt Engineering for Identifying Sexism Using GPT Mistral 7B; Working Notes of CLEF. In Proceedings of the CLEF 2024: Conference and Labs of the Evaluation Forum, Grenoble, France; 9–12 September 2024.
Craswell, N.; Mitra, B.; Yilmaz, E.; Campos, D.; Voorhees, E.M. Overview of the TREC 2019 Deep Learning Track. arXiv 2020, arXiv:2003.07820. [Google Scholar]
Craswell, N.; Mitra, B.; Yilmaz, E.; Campos, D. Overview of the TREC 2020 Deep Learning Track 2021. arXiv 2021, arXiv:2102.07662. [Google Scholar]
Kostric, I.; Balog, K. A Surprisingly Simple yet Effective Multi-Query Rewriting Method for Conversational Passage Retrieval. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, Washington, DC, USA, 14–18 July 2024; Association for Computing Machinery: New York, NY, USA, 2024; pp. 2271–2275. [Google Scholar]
Ma, X.; Pradeep, R.; Nogueira, R.; Lin, J. Document Expansion Baselines and Learned Sparse Lexical Representations for MS MARCO V1 and V2. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, 11–15 July 2022; Association for Computing Machinery: New York, NY, USA, 2022; pp. 3187–3197. [Google Scholar]
Li, H.; Mourad, A.; Zhuang, S.; Koopman, B.; Zuccon, G. Pseudo Relevance Feedback with Deep Language Models and Dense Retrievers: Successes and Pitfalls. ACM Trans. Inf. Syst. 2023, 41, 1–40. [Google Scholar] [CrossRef]
Reimers, N.; Gurevych, I. Sentence-BERT: Sentence Embeddings Using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; Association for Computational Linguistics: Kerrville, TX, USA, 2019; pp. 3982–3992. [Google Scholar]
Wang, X.; MacAvaney, S.; Macdonald, C.; Ounis, I. Effective Contrastive Weighting for Dense Query Expansion. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, Toronto, ON, Canada, 9–14 July 2023; pp. 12688–12704. [Google Scholar]

Figure 1. Query expansion and Gaussian kernel semantic-enhanced dense retrieval model based on LLMs.

Figure 2. Graphical representation of Gaussian kernel function

K_{G s} (u) = \exp [\frac{- u^{2}}{2 σ^{2}}]

with varying

σ

values.

Figure 2. Graphical representation of Gaussian kernel function

K_{G s} (u) = \exp [\frac{- u^{2}}{2 σ^{2}}]

with varying

σ

values.

Figure 3. Prompt template for generating a series of optimized queries with supplemental semantics.

Figure 4. The impact of the hyperparameter

σ

on the experimental results in the LSDR_Gs method.

Figure 4. The impact of the hyperparameter

σ

on the experimental results in the LSDR_Gs method.

Table 1. Characteristics and efficiency of selected large language models.

Model	Parameters	Context Length	Inference Cost
GPT-4	170B	32k tokens	High
GPT-3	175B	4k tokens	Moderate
GPT-3.5	175B	4k tokens	Moderate
Mistral	7B	8k tokens	Low
LLAMA 3	8B	8k tokens	Low

Table 2. Comparison with sparse models. The highest value in each column is boldfaced.

TREC 2019 (43 Queries)					TREC 2020 (54 Queries)
	MAP	NDCG@10	MRR@10	Recall@1000	MAP	NDCG@10	MRR@10	Recall@1000
BM25 (a)	0.3013	0.5058	-	0.7501	0.2856	0.4796	-	0.7863
BM25 + RM3 (b)	0.3416	0.5216	-	0.8136	0.3006	0.4896	-	0.8236
BM25 + RM3 (b)	(+13.38%)	(+3.12%)	-	(+8.47%)	(+5.25%)	(+2.09%)	-	(+4.74%)
BM25 + Rocchio (c)	0.3474	0.5275	-	0.8006	0.3115	0.4910	-	0.8156
BM25 + Rocchio (c)	(+15.30%)	(+4.29%)	-	(+6.73%)	(+9.07%)	(+2.38%)	-	(+3.73%)
BM25 + BERT (d)	0.4441	0.6855	0.8295	0.7553	0.4699	0.6716	0.8069	0.8103
BM25 + BERT (d)	(+47.39%)	(+35.53%)	-	(+0.69%)	(+64.53%)	(+40.03%)	-	(+3.05%)
BM25 + ColBERT (e)	0.4582	0.6950	0.8580	0.7553	0.4752	0.6931	0.8546	0.8103
BM25 + ColBERT (e)	(+52.07%)	(+37.41%)	-	(+0.69%)	(+66.39%)	(+44.52%)	-	(+3.05%)
LSDR_D	0.5468	0.7596	0.8998	0.8987	0.5324	0.7390	0.8579	0.9101
LSDR_D	(+81.48%)	(+50.18%)	-	(+19.81%)	(+86.41%)	(+54.09%)	-	(+15.74%)
LSDR_Gs	0.5526	0.7721	0.9380	0.8991	0.5368	0.7564	0.8643	0.9103
LSDR_Gs	(+83.41%)	(+52.65%)	-	(+19.86%)	(+87.96%)	(+57.71%)	-	(+15.77%)

Table 3. Comparison with dense retrieval models. The highest value in each column is boldfaced.

TREC 2019 (43 Queries)					TREC 2020 (54 Queries)
	MAP	NDCG@10	MRR@10	Recall@1000	MAP	NDCG@10	MRR@10	Recall@1000
ColBERT E2E (a)	0.4318	0.6934	0.8529	0.7892	0.4654	0.6871	0.8525	0.8245
ANCE (b)	0.3715	0.6537	0.8590	0.7571	0.4070	0.6447	0.7898	0.7737
ANCE (b)	(−16.23%)	(−6.07%)	(+0.71%)	(−4.24%)	(−14.35%)	(−6.58%)	(−7.94%)	(−6.57%)
SBERT(c)	0.4060	0.6930	-	0.7872	0.4124	0.6344	-	0.7937
SBERT(c)	(−5.97%)	(−0.06%)	-	(−0.25%)	(−11.39%)	(−7.67%)	-	(−3.74%)
uniCOIL (w/doc2query–T5) (d)	0.4612	0.7024	-	0.8292	0.4430	0.6745	-	0.8430
uniCOIL (w/doc2query–T5) (d)	(+6.81%)	(+1.30%)	-	(+5.07%)	(−4.81%)	(−1.83%)	-	(+2.24%)
ColBERTV2 (e)	0.5149	0.7418	0.8953	0.8873	0.5248	0.7470	0.8453	0.8994
ColBERTV2 (e)	(+19.25%)	(+6.98%)	(+4.97%)	(+12.43%)	(+12.76%)	(+8.72%)	(−0.84%)	(+9.08%)
ANCE-PRF (f)	0.4211	0.6539	-	0.7825	0.4315	0.6471	-	0.7957
ANCE-PRF (f)	(−2.48%)	(−5.70%)	-	(−0.85%)	(−7.28%)	(−5.82%)	-	(−3.49%)
DistilBERT Balanced Average (g)	0.5057	0.7190	-	0.8054	0.4887	0.7086	-	0.9030
DistilBERT Balanced Average (g)	(+17.11%)	(+3.69%)	-	(+2.05%)	(+5.01%)	(+3.13%)	-	(+9.52%)
DistilBERT Balanced Rocchio (h)	0.5249	0.7231	-	0.8352	0.4879	0.7086	-	0.8926
DistilBERT Balanced Rocchio (h)	(+21.56%)	(+4.28%)	-	(+5.83%)	(+4.83%)	(+3.13%)	-	(+8.26%)
CWPRF-AAAT (i)	0.5319	0.7444	-	0.8596	0.5136	0.7246	-	0.8783
CWPRF-AAAT (i)	(+23.18%)	(+7.36%)	-	(+8.92%)	(+10.36%)	(+5.46%)	-	(+6.53%)
CWPRF-OAAT (j)	0.5252	0.7244	-	0.8722	0.5049	0.7204	-	0.8783
CWPRF-OAAT (j)	(+21.63%)	(+4.47%)	-	(+10.52%)	(+8.49%)	(+4.85%)	-	(+6.53%)
ColBERT-PRF Ranker (k)	0.5431	0.7352	0.8858	0.8706	0.4962	0.6993	0.8376	0.8892
ColBERT-PRF Ranker (k)	(+25.78%)	(+6.03%)	(+3.86%)	(+10.31%)	(+6.62%)	(+1.78%)	(−1.75%)	(+7.85%)
LSDR_D	0.5468	0.7596	0.8998	0.8987	0.5324	0.7390	0.8579	0.9101
LSDR_D	(+26.63%)	(+9.55%)	(+5.50%)	(+13.87%)	(+14.40%)	(+7.55%)	(+0.63%)	(+10.38%)
LSDR_Gs	0.5526	0.7721	0.9380	0.8991	0.5368	0.7564	0.8643	0.9103
LSDR_Gs	(+27.98%)	(+11.35%)	(+9.98%)	(+13.93%)	(+15.34%)	(+10.09%)	(+1.38%)	(+10.41%)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Pan, M.; Xiong, W.; Zhou, S.; Gao, M.; Chen, J. LLM-Based Query Expansion with Gaussian Kernel Semantic Enhancement for Dense Retrieval. Electronics 2025, 14, 1744. https://doi.org/10.3390/electronics14091744

AMA Style

Pan M, Xiong W, Zhou S, Gao M, Chen J. LLM-Based Query Expansion with Gaussian Kernel Semantic Enhancement for Dense Retrieval. Electronics. 2025; 14(9):1744. https://doi.org/10.3390/electronics14091744

Chicago/Turabian Style

Pan, Min, Wenrui Xiong, Shuting Zhou, Mengfei Gao, and Jinguang Chen. 2025. "LLM-Based Query Expansion with Gaussian Kernel Semantic Enhancement for Dense Retrieval" Electronics 14, no. 9: 1744. https://doi.org/10.3390/electronics14091744

APA Style

Pan, M., Xiong, W., Zhou, S., Gao, M., & Chen, J. (2025). LLM-Based Query Expansion with Gaussian Kernel Semantic Enhancement for Dense Retrieval. Electronics, 14(9), 1744. https://doi.org/10.3390/electronics14091744

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

LLM-Based Query Expansion with Gaussian Kernel Semantic Enhancement for Dense Retrieval

Abstract

1. Introduction

2. Related Work

2.1. Dense Retrieval

2.2. Query Expansion

2.3. Large Language Models

2.4. Kernel Function

3. Proposed Method

3.1. LLM-Based Query Expansion

3.2. ColBERTv2 Encoding

3.3. Gaussian Kernel Semantic Space

3.4. Retrieval

3.5. Semantic-Enhanced Dense Retrieval

4. Experimental Settings

4.1. Selection of the Large Language Model

4.2. Datasets and Evaluation Metrics

4.3. Hyperparameter Settings

5. Experimental Results Analysis

5.1. Comparison with the Sparse Models

5.2. Comparison with Dense Retrieval Models

5.3. Parameter Sensitivity Analysis

5.4. Discussion and Limitations

6. Conclusions and Future Work

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI