3.1. The Hallucination Phenomenon in GPT
The hallucination phenomenon in GPT models arises from their self-supervised learning approach. The models are trained to optimize the probability of generating tokens based on their context, even in the absence of a well-defined correct answer. Consequently, GPT models may produce low-likelihood outputs that do not accurately reflect the underlying data distribution.
Owing to the inherent constraints of GPT models, they are compelled to generate outputs even when the probability of the predicted token is low. This is due to the self-supervised loss function, which motivates the model to generate tokens that optimize the likelihood of the predicted sequence, regardless of the output’s accuracy.
In this study, we focus on the hallucinations in GPT models that can occur even when a well-trained GPT is provided. The hallucination phenomenon can intensify as the model generates a series of low-likelihood tokens. When these tokens are used as input for subsequent predictions, the probability of generating additional low-likelihood tokens may escalate, resulting in increasingly unreliable outputs.
Definition 1. Hallucination in GPT models pertains to the generation of contextually implausible, inconsistent with the real world, low-probability tokens that diverge from the anticipated output based on the input context and the true underlying distribution.
To formally illustrate the forced selection of the highest probability token in ambiguous contexts, we begin by introducing the following assumption regarding the distribution of estimated probabilities.
Assumption 6. When the input context does not provide sufficient information for a clear and optimal token choice, the estimated probabilities obtained from (2) are distributed such that the difference between the highest and subsequent probabilities is relatively small. It is essential to emphasize that our focus is on the hallucination phenomenon that can arise even in well-trained GPT models, supposed by Theorem 2, Assumption 5, and Proposition 3. Under Assumption 6, we can now analyze the selection process of GPT models in ambiguous contexts. Let
be a small constant, and
denote the highest probability among the possible tokens, i.e.,
. Then, for all
, we have
Proposition 4. In ambiguous contexts, GPT models are forced to select the token with the highest estimated probability, even when the difference in probabilities between the highest and subsequent tokens is relatively small, as described in (16). Proof. Given the softmax function in (
2), the GPT model generates tokens by sampling from the probability distribution of the subsequent token conditioned on the context. In ambiguous contexts, the model is forced to select the token with the highest estimated probability, despite the small difference in probabilities between the highest and subsequent tokens.
From (
16), we can observe that the difference between the highest and subsequent probabilities does not exceed
. This implies that the model may select suboptimal tokens with only marginally lower probabilities than the optimal choice. The forced selection of the highest probability token in such situations may result in the generation of contextually implausible tokens, leading to hallucination. □
In the GPT model, the generated text is a sequence of tokens (words or subwords), and the model chooses each token based on the probability distribution it learned during training. When the input context is ambiguous, meaning it could lead to multiple plausible outputs, the model has to choose between tokens with similar probabilities. In this situation, even if the GPT model is well trained, it might still generate a token that is not contextually correct, which may lead to hallucinations. An example of this scenario is displayed in
Figure 2.
Remark 4. The risk of hallucination increases with the degree of ambiguity in the input context. As the context becomes less informative, the difference between the highest and subsequent probabilities narrows, increasing the likelihood of generating low-probability tokens that deviate from the expected output. This observation is important because it highlights that even well-trained GPT models can produce hallucinations when faced with ambiguous input contexts.
To scrutinize this phenomenon, we initially introduce a measure of uncertainty in the GPT model’s predictions.
Definition 2. Let denote the probability distribution of the next token in the sequence, as given by (2). The uncertainty associated with the GPT model’s prediction at position is defined as the entropy of this distribution: We now present a critical assumption related to the hallucination phenomenon.
Assumption 7. Hallucination takes place when the GPT model generates a low-probability token , given the previous tokens , and subsequently employs this token as input for predicting the next token .
Remark 5. Assumption 7 suggests that the hallucination phenomenon may intensify as the model produces low-probability tokens, resulting in increasingly unreliable predictions.
Lemma 1. Under Assumption 7, the generation of low-probability tokens in GPT models correlates with heightened uncertainty, as measured by the entropy in (17). Proof. Let
represent the actual token at position
. If the GPT model generates a low-probability token
, we observe
. Consequently, the entropy
, as provided by (
17), will be elevated, signifying increased uncertainty in the model’s prediction. □
Proposition 5. Given a well-trained GPT model as indicated by Theorem 2, Assumption 5, and Proposition 3, there still exists a nonzero probability of generating hallucinatory tokens.
Proof. Consider a well-trained GPT model that has a minimized loss function , as ensured by Proposition 3. However, as previously discussed in Assumption 6, the model may still encounter ambiguous contexts where the difference in probabilities between the highest and subsequent tokens is relatively small.
In such cases, as demonstrated in Proposition 4, the GPT model is forced to select the token with the highest estimated probability, even when the difference in probabilities is small. This may lead to the generation of contextually implausible tokens, which can cause hallucination.
Therefore, even with a well-trained GPT model, there exists a nonzero probability of generating hallucinatory tokens in ambiguous contexts, indicating that the optimization process alone cannot completely eliminate the occurrence of hallucinations. □
Remark 6. The results of Proposition 5 imply that there is an inherent trade-off between optimizing the GPT model and the occurrence of hallucinations. This trade-off stems from the model’s inherent uncertainty in predicting the next token in ambiguous contexts, as described in Definition 2.
Assumption 8. In a well-trained GPT model, as indicated by Theorem 2, Assumption 5, and Proposition 3, the generation of hallucinatory tokens is primarily driven by the model’s inherent uncertainty in predicting the next token, as captured by the entropy in Definition 2.
Lemma 2. Under Assumption 8, the occurrence of hallucinations in a well-trained GPT model is strongly correlated with the model’s uncertainty, as measured by the entropy in (17). Proof. According to Assumption 8, the generation of hallucinatory tokens in a well-trained GPT model is mainly driven by the model’s uncertainty in predicting the next token. As defined in Definition 2, the entropy of the probability distribution of the next token serves as a measure of the model’s uncertainty.
Therefore, under Assumption 8, the occurrence of hallucinations in a well-trained GPT model is strongly correlated with the model’s uncertainty, as captured by the entropy in (
17). □
Here, we demonstrate how the hallucination in GPT models can be reinforced by using the selected token as input for estimating the subsequent tokens, and how this reinforcement can lead to a series of hallucinations in the generated text. We approach this problem by analyzing the conditional probabilities of generating subsequent tokens given the context and the previously generated tokens.
Assumption 9. The probability of generating a hallucinatory token at position is conditionally independent of generating a hallucinatory token at position , given the context up to position i.
Under Assumption 9, we can now analyze the reinforcement of hallucination in GPT models. Let denote the event that the generated token is hallucinatory, and let denote the conditional probability of generating a hallucinatory token given the context up to position i.
Proposition 6. The probability of generating a hallucinatory token at position , conditioned on generating a hallucinatory token at position , is given by Let . If , then generating a hallucinatory token increases the likelihood of generating a hallucinatory token .
Theorem 3. If the conditional probability R satisfies , then generating a hallucinatory token increases the likelihood of generating a hallucinatory token , leading to the reinforcement of hallucination in GPT models.
Proof. Under Assumption 9, we have
Now, we can express the joint probability of generating hallucinatory tokens
and
as
If
, then
which implies that generating a hallucinatory token
increases the likelihood of generating a hallucinatory token
. This reinforcement effect can cascade through the generated text, leading to a series of hallucinations in GPT models. □
Remark 7. The risk of reinforcement of hallucination depends on the conditional probability R. If the GPT model generates a hallucinatory token , the likelihood of generating a hallucinatory token increases when . This reinforcement effect can propagate through the generated text, exacerbating the hallucination phenomenon.
Proposition 7. The likelihood of generating a hallucinatory token at position depends on the previously generated hallucinatory tokens , the input context up to position i, and the values of the conditional probabilities for .
Proof. Using the conditional probability
defined for
as the probability of generating a hallucinatory token
given a hallucinatory token
, we can derive the joint probability of generating a sequence of
n hallucinatory tokens as follows:
The likelihood of generating a hallucinatory token at position is determined by the joint probability of generating the sequence of n hallucinatory tokens and the values of the conditional probabilities . This likelihood increases as the values of increase, which in turn depends on the previously generated hallucinatory tokens and the input context up to position i. □
Remark 8. The dependency of the likelihood of generating a hallucinatory token on previously generated hallucinatory tokens and the input context highlights the importance of mitigating hallucination in GPT models, as the generation of one hallucinatory token can influence the generation of subsequent hallucinatory tokens and lead to a cascade of hallucinations in the generated text.
Definition 3. Hallucination mitigation refers to the process of modifying the GPT model’s behavior to reduce the likelihood of generating hallucinatory tokens, thereby improving the model’s output quality and reliability.
3.2. The Creativity of GPT
To understand the relationship between hallucination and creativity in GPT models, we first define a measure of creativity in the model’s predictions.
Definition 4. Let denote the probability distribution of the next token in the sequence, as given by (2). The creativity associated with the GPT model’s prediction at position is defined as the entropy of this distribution normalized by the maximum entropy:where is the maximum entropy achievable for the given vocabulary , which occurs when all tokens have uniform probability. We now introduce a key assumption regarding the relationship between hallucination and creativity.
Assumption 10. Creativity in GPT models can be enhanced by the hallucination phenomenon, as it allows the model to explore a broader space of token sequences beyond the most probable ones conditioned on the given input.
Remark 9. Assumption 10 implies a potential trade-off between the hallucination and creativity of GPT models. This trade-off suggests that minimizing hallucination-related errors may lead to a reduction in the model’s creativity, as it becomes more conservative in generating token sequences.
Proposition 8. Under Assumption 10, the creativity of GPT models, as measured by the normalized entropy in (19), will be higher in the presence of the hallucination phenomenon. Proof. According to Lemma 1, the generation of low-probability tokens in GPT models is associated with high uncertainty, as measured by the entropy in (
17). Under Assumption 10, this increased entropy also implies a higher level of creativity, as given by (
19). Therefore, the creativity of GPT models will be higher in the presence of the hallucination phenomenon. □
Conjecture 1. There exists an optimal trade-off between hallucination and creativity in GPT models, such that the model’s performance is maximized when operating at this trade-off point.
Considering Conjecture 1, we seek to characterize the optimal trade-off between hallucination and creativity in GPT models. Specifically, we consider a parametric family of models, where each model is tuned to balance hallucination and creativity differently. Let denote a GPT model parametrized by . The parameter controls the trade-off between hallucination and creativity, with corresponding to a purely hallucination-minimizing model and corresponding to a purely creativity-maximizing model.
Definition 5. Let be a GPT model parametrized by . We define the hallucination–creativity trade-off parameter α as the weighting factor that balances the contribution of the hallucination-related prediction error and the creativity of the model in the model’s objective function:where denotes the KL divergence and C denotes the creativity measure as defined in (19). Our goal is to find the optimal value of the trade-off parameter that maximizes the model’s performance, as measured by a suitable performance metric. To this end, we introduce the following performance metric:
Definition 6. Let denote the probability distribution of the next token in the sequence, as conditioned on the specific task requirements. The performance metric of a GPT model is defined as the expected KL divergence between the task-specific distribution and the model’s predicted distribution: Conjecture 2. There exists an optimal trade-off parameter
that maximizes the performance metric
for GPT models, as defined in (
21).
Consider the optimization problem of finding the optimal trade-off parameter
that maximizes the performance metric
:
To solve (
22), we first examine the relationship between the objective function
in (
20) and the performance metric
in (
21).
For a fixed
, the objective function can be written as follows:
where
and
represent the hallucination-related prediction error and the creativity of the model, respectively.
Example 2. To illustrate the role of the compromise parameter α, let us consider an example in which a GPT model is generating text for a storytelling task. In this scenario, a high α value would prioritize minimizing the hallucination-related prediction error, potentially resulting in a more conservative and contextually plausible output. However, this output might lack originality and variety, which are essential for a compelling story. On the other hand, a low α value would emphasize creativity, leading to a more diverse and original output. However, this might come at the expense of increased hallucination and reduced contextual plausibility. The optimal trade-off parameter represents a balance between these competing objectives, yielding an output that exhibits both creativity and contextual plausibility while minimizing hallucinations.
We analyze the derivative of
with respect to
:
By setting
, we can find the critical points of the objective function:
The critical points correspond to the trade-off points where the hallucination-related prediction error is balanced with the creativity of the model. To find the optimal trade-off point
, we need to analyze the second derivative of
with respect to
:
Since the second derivative is always zero, we cannot directly determine the concavity or convexity of the objective function. Thus, we need to further investigate the relationship between the objective function and the performance metric.
In (
21), the KL divergence is always non-negative, and we can conclude that
is minimized when the model’s predictions align with the task-specific probability distribution:
To analyze the optimal trade-off between hallucination and creativity, we investigate the behavior of the performance metric
as a function of the trade-off parameter
. We first derive the gradient of
with respect to
:
By plugging (
27) into (
28), we can express the gradient of the performance metric as a function of the trade-off parameter
:
To find the optimal trade-off parameter
, we need to solve the following optimization problem:
Since the optimization problem in (
30) is nonconvex and the gradient of the performance metric with respect to
depends on the trade-off parameter
, we resort to a gradient-based optimization method to find the optimal trade-off parameter
.
3.3. Examining the Interplay between Hallucination and Creativity
Assumption 11. The efficacy of GPT models across various tasks hinges on the delicate equilibrium between hallucination and creativity. Adjusting this equilibrium may potentially improve the overall performance of the model.
Establishing an ideal equilibrium between hallucination and creativity is vital for the model’s effectiveness in a wide range of applications. The problem below encapsulates this concept.
Problem 1. Let represent a collection of GPT models parameterized by a trade-off parameter α, and let denote the performance metric as defined in (21). The optimization problem involves identifying the optimal trade-off parameter that maximizes the performance metric: Remark 10. Optimizing the trade-off parameter in Problem 1 proves difficult due to the vast parameter space of GPT models and the potential nonconvexity of the performance metric .
Conjecture 3. The performance metric might present multiple local optima associated with distinct values of , each signifying a unique equilibrium between hallucination and creativity.
Considering the intricacy of the optimization landscape, it is crucial to explore efficient methods to examine the interplay between hallucination and creativity. One feasible approach is to utilize meta-learning techniques that adaptively update the trade-off parameter during training, consequently enabling the model to learn the optimal equilibrium.
Example 3. A meta-learning algorithm can iteratively update the trade-off parameter α based on the model’s performance on a validation set. The algorithm may employ methods such as gradient-based optimization or Bayesian optimization to effectively search for the optimal α value.
Another avenue for future research is to investigate the impact of model architecture and training techniques on the trade-off between hallucination and creativity. For instance, it may be possible to design novel self-attention mechanisms or regularization techniques that explicitly encourage the model to maintain a balance between generating plausible yet creative responses.
Example 4. The development of an attention mechanism that explicitly models the relationship between the input and output tokens could potentially improve the balance between hallucination and creativity. Such a mechanism could be designed to assign higher importance to relevant tokens in the input while penalizing the generation of implausible tokens.
Problem 2. Investigate the characteristics of the optimal trade-off parameter and its associated local optima, in relation to the GPT model’s performance across a variety of tasks.
Proposition 9. The optimal trade-off parameter may be influenced by the particular task requirements and the structure of the input data.
In order to tackle the task-specific dependencies, adopting an adaptive strategy for fine-tuning the trade-off parameter may contribute to enhanced performance.
Assumption 12. Modifying the trade-off parameter α depending on the particular task and input data can lead to superior GPT model performance.
As a result, devising an adaptive method for dynamically fine-tuning the trade-off parameter becomes an essential research focus.
Problem 3. Create an adaptive method to dynamically modify the trade-off parameter α in GPT models based on task demands and input data.
Conjecture 4. Incorporating an adaptive method for fine-tuning the trade-off parameter will boost GPT model performance, as evidenced by the performance metric , across an extensive range of tasks.
Remark 11. The suggested adaptive method for fine-tuning the trade-off parameter α should effectively generalize across various tasks and input data distributions, guaranteeing consistent performance enhancements.
To showcase the effectiveness of the adaptive method, it is crucial to validate its performance using real-world tasks and datasets.
Problem 4. Confirm the efficiency of the adaptive method for fine-tuning the trade-off parameter α by employing real-world tasks and datasets, while quantifying the improvement in GPT model performance.
A deeper exploration of the interplay between hallucination and creativity in GPT models will offer valuable insights into the model’s constraints and guide the creation of more robust and adaptable language models. The challenges and future work outlined here set the stage for novel research avenues in comprehending and optimizing the interplay between hallucination and creativity in GPT models.