Yet it is technically challenging to obtain an empirical hazard value. If an experiment measures the memory after five different times, say
, then it is possible to find the proportion of times where the target information is correct for each time (i.e, values for
for
can be measured). However, to obtain a valid
measurement, corrections are needed to take into account guessing processes and to account for the possibility that the performance might be also a function of retrieval processes. Thus, to obtain credible values for
requires a validated psychometric model so as to extract a measure of the probability of target storage [
18]. However, even with these correction procedures, an empirical measurement of hazard has further challenges to obtain an estimate of
, which is required to compute hazard. To estimate
a second
measurement is required. Empirical hazard estimates have potentially high statistical error because there is likely a large error in estimating
that is compounded by dividing by
. Consequently, valid and statistically reliable empirical hazard estimates are very difficult to obtain in practice. However, it is possible to use another surrogate function to glean some useful information about the hazard function without having to actually estimate the hazard function directly.
4.1. Using a Surrogate Function for Assessing the Hazard Function
The function
can be used to ascertain if the early part of the hazard function is increasing or decreasing [
43,
54]. Experimentally
can be readily estimated because
t is known as is
. The rationale for how
can be used as a surrogate function for ascertaining hazard properties is established with the following two theorems.
Theorem 6. If a continuous probability or subprobability on support has monotonically decreasing hazard, then is monotonically decreasing with a maximum value of .
Proof. Given a monotonically decreasing hazard function on the support
, it is known from Corollary 1 that the density
is monotonically decreasing for all
t with the maximum at
. From L’Hospital’s rule it follows that
. Furthermore,
For any
,
. However, from the mean-value theorem of integral calculus, the integral
, where
, so
. Thus, from (22), it follows that the sign of
is directly a function of the term
. Because
is monotonically decreasing and because
, it follows that
. Hence for monotonically decreasing hazard functions on support of
,
must be monotonically decreasing for all
t with a maximum at
. □
Theorem 7. If the function is monotonically increasing for , then the hazard is increasing in that interval.
Proof. For all , with where . Because is increasing over the interval , it follows that over this interval. Thus, over this interval, and it follows that . However, from Equation (10), it follows that is increasing because in this interval. □
Theorems 6 and 7 indicate that there is a way to critically falsify any theory for memory hazard that predicts strictly monotonically decreasing hazard. Namely if evidence can be found that is initially increasing, then that finding would be falsifying evidence for theories that predict monotonically decreasing hazard. Yet to find a statistically reliable difference in estimates requires a reasonably large sample size, which leads to the practical reality that experimentation on requires pooling data over different research participants and test trials. However, from Theorem 3, it is known that a mixture of non-increasing hazard functions results in a monotonically decreasing hazard. Consequently if all participants in a group have monotonically decreasing hazard, then the hazard function for the pooled data is also monotonically decreasing. Thus, an experiment that has a large sample size for each retention interval and has research protocols that enable model-based correction estimates for the probability for memory storage has a chance to see if is increasing in the early part of the retention interval. Of course if the experiment does not examine time values in the critical period where a hazard function peak exits, then the study might miss detecting the hazard peak.
Experiment 3 from [
25] was designed in a fashion to enable estimates of
for various short retention intervals, although the measurement of
was not the purpose of the study.(The study addressed a different set of questions associated with implicit and explicit memory. The experiment was designed to see if implicit memory had a different retention function than explicit memory. Implicit memory is exemplified when a participant cannot recall or confidently recognize an item but nonetheless can correctly select the item from a lineup list.) The study examined the memory of novel nonsense letters taken from a stimulus set that was scaled to be of low meaningfulness [
55]. Following the presentation of a target letter triad, there was a series of digits that participants had to verbally repeat as soon as they heard the digits. The target letters and the digits were presented over headphones. The testing involved a decision about four visually displayed letter triads. For half of the test probes none of the stimuli matched the target item for that trial, and the other half had one of the stimuli match the target. The participant had to respond
yes or
no if any of the triads on the four-item list of triads was the target for that trial. The participants also had to indicate if they were of high or low confidence in their decision.(Given that the original paper was focused on measuring explicit and implicit memory, there was also a series of forced-choice tests after the yes/no decision; however, those forced-choice data are not pertinent for the
analysis.) For each retention interval there were 696 tests where the target is on the list and another 696 tests where the target was not on the list.
The analysis in the current paper is based on model 7B from [
23] for estimating the storage probability for each retention interval. The data needed for using model 7B is based on a yes-no recognition memory test along with high-versus-low confidence judgments.
Table 3 defines the response cells frequencies
associated with the experimental task. The frequencies for old recognition tests
form a four-cell multinomial that has in the population the proportions
where
. Similarly for the new recognition test there are population proportions
where
. The proportions for the cells of the two multinomials are a function of memory and task factors. Psychological processes such as sufficient storage, guessing when target storage is insufficient, and rating processes influence the proportions in the multinomial outcome cells. Model 7B is a psychometric model that describes how those psychological processes map onto the multinomial outcomes. The parameters of the 7B model are denoted by subscripted
symbols for processes such as for target storage (
), guessing
yes on old recognition (
), guessing
no on new recognition (
). There are also parameters for using the
sure versus
unsure rating when guessing. The model structure consists of two probability trees for the old and new recognition testing procedures. Psychometric models of this type have come to be called multinomial processing tree (MPT) models.
The latent
parameters can be statistically estimated by means of a procedure called population parameter mapping (PPM) [
56,
57,
58]. The parameters of model 7B can also be estimated with either a frequentist maximum likelihood method or with a conventional Bayesian method, but the PPM procedure provides additional information about model coherence and is computationally faster [
23]. PPM is a specialized Bayesian Monte Carlo sampling procedure developed to estimate the latent parameters of a scientific model for multinomial data. This procedure was invented as a Bayesian Monte Carlo sampling method that avoided the lower efficiency of Markov chain Monte Carlo (MCMC) algorithms. Unlike MCMC procedures, the PPM procedure is not an approximate method, and it does not require a burn-in period [
58]. Moreover, the sampled vectors are independent, so the issue of correlated samples, which can occur with MCMC algorithms, is not a problem. With multinomial data there is an exact Monte Carlo sampling method for sampling vectors of points
from the posterior distribution for sets of multinomial data [
56,
57,
58]. Each vector from
spaced is mapped to a corresponding
-space vector for the scientific model parameters. The collection of coherently mapped points from
space to
space are used for point and interval estimation of the latent model parameters. Chechile [
23] proved that the 7B model is identifiable and demonstrated that the PPM estimation method out performed (in terms of accuracy) the frequentist maximum likelihood estimation, especially for sample sizes less than 1000. For sample sizes greater than or equal to 1000, the maximum likelihood estimate and the PPM estimate are virtually the same. See the
Appendix A for additional details about the PPM method for estimating the storage parameter for model 7B as well as for seeing the software for implementing the estimation.
The frequencies for the various temporal delays for experiment 3 from [
25] are provided in
Table 4. The time values shown in the table are one second longer than the digit-repeating interval in order to give the participants a chance to read the four-item visual list to which they had to respond either
yes or
no. The participants were instructed to respond
yes if any of the four items on the test probe was the target stimulus for that trial. Perfect performance on this task would correspond to 696 trials of correct performance with high confidence on the target-present tests and 696 trials of correct performance with high confidence on the target-absent tests. These values were used to create an initial set of
values as a baseline for finding
.
The key information for the
analysis for the data from experiment 3 in [
25] is provided in
Table 5. For each
t value the posterior median for
is provided. The corresponding
is the median of the set of mapped values for
. The 95-percent interval for each condition is based on the quantiles for this set of
values. See the
Appendix A for the details about the storage estimation software.
There is a clear maximum for for . There is a posterior probability of that , and there is a probability of that . For times greater than , there is a decline in because (1) the probability that is and (2) because the probability that is .
Thus, via Theorem 6 the data from this experiment are inconsistent with any theory that predicts strictly monotonically decreasing hazard. Additionally, via Theorem 7, it is clear that the hazard is increasing over a brief initial interval of approximately five seconds after the initial encoding. Thus, this experiment has falsifying evidence against the first five models listed in
Table 2. However, more strongly, the experiment is inconsistent with any model (either an existing theory or a yet to be generated theory) that predicts strictly monotonically decreasing hazard. Yet, given the import of this strong conclusion, it is prudent to explore if this general conclusion can be replicated and supported by other experiments.
Experiment 2 from [
17] is a study that provides an assessment of
over a time frame from 1.33 to 76 s. In this experiment the targets are again letter triads of low meaningfulness. Additionally, the retention interval is filled with digits that had to be repeated (i.e., letter shadowing). The target stimulus, the test probe, and the interpolated digits are all auditory, but the gender for the articulation of the target item and test probe is different from the gender used for the interpolated digits. The probe stimulus for each recognition test was either the original target or a novel triad of nonsense letters. The data for each condition is provided in
Table 6.
Similar to the analysis in
Table 5, the key results for experiment 2 from [
17] are provided. The time values listed in the table are one second longer than the time for digit shadowing in order for the participant to hear the probe stimulus. Perfect performance in this experiment would consist of 900 trials with high confidence correct old recognition, and 900 trials with high confidence correct rejection of the foil probes. Those values were used for generating the vector of baseline
values need for the
calculations. For each temporal delay the values for posterior median for
,
and the 95-percent interval for
are provided. There is a clear maximum for
at
. The posterior probability that
is greater than 0.9999, and the probability that
is 0.9971. Furthermore there is a probability of 0.9994 that
, and a probability of 0.9997 that
. Thus, this experiment provides further support for the conclusion that memory hazard initially increases before the hazard declines with time.
Experiment 1 from [
17] is a third study for which
can be estimated for several times in the critical time period after memory encoding. The experimental protocols for this experiment are the same as for the study upon which
Table 7 is based, except for the sample size. In this study, there were in each condition 240 old recognition tests and 240 new recognition tests.(There is another set of four retention intervals that were examined with a slower rate for the interpolated digit-shadowing task. These four conditions are omitted because that rate of presenting digits in the retention interval was too slow to prevent participants from rehearsing the target items [
17].) For this study there is not a five-second condition, but instead there are two intervals bracketing the five-second retention interval (i.e., intervals of
and
). The frequency data are in
Table 8 and the
results are in
Table 9. Interestingly
is not statistically different between the two retention intervals that bracket five seconds (i.e., the
and
conditions). Both of these conditions did have a reliably larger
than the
for the
condition (with probability greater than
for both cases), and both of these conditions had a
larger than
for the
condition (with a probability greater than
for both cases).
The above three experiments indicate that the human memory hazard function is not strictly monotonically decreasing as predicted by the first five models listed in
Table 2. These studies when analyzed in light of Theorems 6 and 7 establish a framework for using an experimentally practical surrogate function
to ascertain that the human memory system has a peaked-shaped hazard function. This approach has the advantage of learning about hazard empirically without actually estimating hazard per se. The method thus avoids the technical and statistical difficulties of estimating a density-based metric. Yet the research still has the limitation of only examining human participants. It might be possible that some animal species might exhibit a different memory hazard function. However, it is difficult to measure the
surrogate function in a careful fashion with animals because they need to be trained to do a complex recognition model task. Human participants are capable of following complex instructions that allow for model-based corrections to obtain a measure of memory storage after various retention intervals. Nonetheless it is important to understand the properties of the human memory system even if humans differ from other animals. In the next subsection a case is made for a particular function that has peak-shaped hazard for human memory.
4.2. Evidence for the Two-Trace Hazard Model
Given the results from the previous subsection, attention is focused on the four models that predict a peak-shaped hazard function (i.e., the modified Anderson-Schooler model, the Ebbinghaus function, the trace susceptibility theory, and the two-trace hazard model). For each theory there is a survivor function that predicts target storage as a function of time, and each theory has fitting parameters that can be adjusted for each individual. Although the pooled data were used to assess the
function, the averaging of nonlinear functions with different fitting parameters can be misleading as shown by Estes [
59]. Thus, for testing the fit quality of the various models, it is important to do the testing separately for each person. Moreover, to test the fit quality on an individual basis requires a sizable sample size for each retention interval to reduce the uncertainty in the estimated storage probability. It is also important to assess the storage probability of memory targets over both the short-term and the long-term temporal regions.
In the short-term time frame, experiment 2 from [
17] is particularly important because each of 30 participants was evaluated over six retention intervals with 90 replication tests per interval; thus the data in this study enables a reasonably precise estimate of target storage for each condition and for each participant. For this analysis the two-trace hazard model, with its four parameters, was omitted because it could always be as effective as the trace susceptibility theory. Both the two-trace hazard model and the trace susceptibility theory employ a Weibull model. Consequently, if the
b parameter of both models are equivalent and if the
d parameter of the two-trace hazard model is equated to the
a parameter of the trace susceptibility theory, then there are always values for the extra
a and
c parameters of the two-trace hazard model that effectively match the prediction equation of the trace susceptibility theory.
The measure of fit quality for each model was the correlation between the estimated storage probability
and the predicted value from the theory. The
means for the trace susceptibility theory, the Ebbinghaus function, and the modified Anderson-Schooler model are, respectively,
,
, and
[
43]. Importantly, the modified Anderson-Schooler model was not the best fit for any of the 30 participants. The trace susceptibility theory was the best fit for 22 participants and the Ebbinghaus function was the best fit for the other 8 participants. Given the fact that the Ebbinghaus function had a better fit than the trace susceptibility theory for 8 participants, these people were also examined in terms of the two-trace hazard model. The two-trace hazard model had a better fit for each of those 8 participants than did the Ebbinghaus function. Thus, in the short-term time frame, the trace susceptibility and the two-trace hazard model have a better fit to the storage probability than either the modified Anderson-Schooler or the Ebbinghaus model.
There have not been very many studies that report data for individuals in the long-term temporal domain. Nearly all the data sets are based on averaging data over participants. Yet Estes [
59] showed that learning and retention curves based on grouped data are problematic. Consequently, Chechile [
43] was only able to discuss two studies that had individual-participant data that enabled estimates of the storage probability over a long-term temporal region [
11,
60]. Chechile [
43], who developed both the trace susceptibility theory and the two-trace hazard model, reported that the trace susceptibility theory failed to fit this long-term data, but the two-trace hazard model did fit each participant well. However, the two studies reported by Chechile only had results from three people, who were either the author or a relative of the author [
11,
60]. However, another study by Sloboda [
61] examined nine participants over ten retention intervals in a fashion that enables the estimation of the model 7B storage probability value for each person for each interval. The ten intervals examined were: 30 s, 5 min, 15 min, 30 min, 1 h, 3 h, 9 h, 24 h, 72 h, and 144 h. The target stimuli in the Sloboda experiment were meaningful words selected from a word frequency norm [
62]. There were 120 tests for each of the ten retention intervals (i.e., 60 target-present trials and 60 target-absent trials). Sloboda assumed that trace 1 of the two-trace hazard model was no longer available for retention intervals of 5 min or longer. If we denote the storage probabilities for traces 1 and 2 as, respectively,
and
, then
and
. The mean value for
d was 12.96 min
−2 for the previously reported 30 participants in the short-term time domain; consequently by 5 min the value for
would be less than
. Thus, Sloboda only fitted
to the long-term storage probability values. The mean and standard deviation for the fits of the model for the
a,
b, and
c parameters, respectively, are: 1.9(1.1) min
−2,
, and
. The resulting fit is excellent for each of the nine participants; the mean correlation between the model predicted values and storage probabilities is
.
Thus, the currently available data are sufficient to indicate that the modified Anderson-Schooler model, the Ebbinghaus model, and the trace-susceptibility theory have problems fitting all the participants; whereas the two-trace hazard model has an excellent fit to each participant. Consequently, until a better theoretical function is found, the two-trace hazard model is the current best option for a model of memory retention as a function of time. In this model, the storage of the target is with and .