3.1.1. Training
Results of the training session for both groups are presented in
Figure 1B–E. Average full training duration in the ST group was 13.06 (±0.88) days and 15.37 (±1.13) days in the ET group (not including the over-training period). The average correct response percentage on the day mice reached the success criterion in the ST group was 77.41% (±2.91) and 72.98% (±1.32) in the ET group. ET group overtraining period consisted of an average of additional 11.87 (±1.25) days of training and the average correct response percentage by the end of the overtraining period was 76.11% (±4.34).
We took several behavioral measures to compare throughout the paradigm: checks (the number of head entries into the reward magazine), licks (the number of actual licks on the port), and lever presses. As the ST protocol consists of a significantly smaller number of training days, we used the largest number of comparable days (18 first training days) in order to compare both groups using a two-way ANOVA comparing the training-day X group (ST/ET) for each measure. Training produced a learning curve in all measures, gradually increasing responses above chance level, and maintaining those levels throughout the rest of the training.
Looking at the measure of checks, we found a main effect for training days: F(17, 153) = 17.032,
p < 0.001, with Tukey HSD post hoc revealing that the number of magazine checks is significantly higher than the beginning of training already on day 4, and that all days from the 4th day onwards are significantly different than the first day (
p < 0.05; see
Figure 1B). For the licking measure, we found a main effect for training days: F(17, 153) = 6.509,
p < 0.001, with Tukey HSD post hoc revealing that the number of licks is significantly higher than the beginning of training already on day 13, and that all days from the 13th day onwards are significantly different than the first day (
p < 0.05;
Figure 1C). For lever presses, we found a main effect for training days: F(17, 153) = 3.374,
p < 0.001, with Tukey HSD post hoc revealing that the number of lever presses was significantly higher than the beginning of training already on day 6, and that all days from 6th day onwards were significantly different than the first day (
p < 0.05; see
Figure 1D). From that, it seems that magazine checks and lever presses are quicker to increase in rate as compared to licking.
However, when looking at accumulation of correct lever presses (that is—the percentage of presses on the lever indicated by the tone and leading to reward), learning seems slower. We found a main effect for training days: F(17, 153) = 4.801,
p < 0.001, with Tukey HSD post hoc revealing that the number of magazine checks was significantly higher than the beginning of training on the 14th day, and that all days from 14th day onwards were significantly different than the first day (
p < 0.05; see
Figure 1E). We also directly compared the number of correct and incorrect presses in that statistical model. A three-way ANOVA—group (ST/ET) * day (1–18 comparable days) * correct/incorrect number of presses revealed a significant main effect for days: F(17, 136) = 2.844,
p < 0.001. As already reported, we could already see an increase, in general, in lever presses with training from the 6th day training. We also found a main effect for correct presses vs. an incorrect number of presses (F(1, 8) = 15.137,
p < 0.005) and, most importantly, a day * correct/incorrect interaction (F(17, 136) = 4.0598,
p < 0.001), showing that, in the early training days, there were no differences, and gradually, as training progresses, we saw more correct than incorrect presses. In fact, Tukey HSD comparisons show that, starting on the 13th day of training, the average difference between the number of correct and incorrect presses becomes significant—meaning towards the end of the ST. While there was a fast increase in general lever pressing at the beginning of the training, by the end of it, the behavior was more precise, and the increase was only in the correct lever presses. It is important to note that no differences in correct vs. incorrect presses were found between the groups.
3.1.2. Devaluation
The success of the devaluation is indicated by the consumption of the devalued reward on the last day of devaluation as compared to the first day. In all, the amount of devaluation days needed to achieve the reduction criterion after training was slightly high, which is expected in this strain of mice [
28]. There was no difference between the groups in the average amount of devaluation days to reach criterion or in the actual reduction in licks on the last day of devaluation. ST mice reached criterion within 5.2 devaluation days in average (1.97 STD) and ET mice reached criterion within 4.4 days (1.30 STD). A
t-test revealed no significant difference between the number of days to reach criterion between the groups (t(21) = 1.061,
p = 0.3 n.s.). Moreover, ST mice reached criterion of 11.5% licks in average as compared to the first day (13.8 STD) and ET mice reached criterion of 8.9% days (5.89 STD). A
t-test revealed no significant difference in the average criterion between the groups (t(21) = 0.507,
p = 0.62 n.s.). The specificity of the devaluation effect to the reward, which was devalued, was indicated by comparing it to the reward that was presented during altering days, but was not devalued. We found that our devaluation procedure was successful in both, as indicated by a decrease in the consumption only of the devalued reward and not the non-devalued reward in both ST and ET groups (
Figure 1F–G). When looking at the ST group, a two-way ANOVA with time (first/last day)*condition (devalued/ non-devalued reward) revealed a significant main effect for condition (F(1, 25) = 15.664,
p < 0.001), indicating more consumption for the non-devalued reward and, most importantly, a time × condition interaction (F(1, 25) = 33.539,
p < 0.001). Tukey HSD post hoc revealed that consumption on the last day of the devaluation procedure was significantly lower than the first day, only for the devalued reinforcer (
p < 0.001) but not for the non-devalued (
p = 0.614; n.s.). When comparing the consumption between the different conditions (devalued and non-devalued reinforcers), we found a significant difference in the consumption of the reinforcers on the last day of exposure (
p < 0.01), but not the first day (
p = 0.178; n.s.). These same effects were also true for the ET group, where two-way ANOVA with time (first/last day) × condition (devalued/ non-devalued reward) revealed a significant main effect for condition (F(1, 14) = 7.379,
p < 0.05), indicating more consumption of the non-devalued reward and, most importantly, a time × condition interaction: F(1, 14) = 11.099,
p < 0.01. Tukey HSD post hoc showed that the consumption during the last day of devaluation was significantly lower than on the first day, but only for the devalued reinforcer (
p < 0.001) and not the non-devalued (
p = 0.77; n.s.). When comparing the consumption between the different conditions (devalued and non-devalued), we found a difference on the last day of exposure (
p < 0.005), but not the first day (
p = 0.998; n.s.).
3.1.3. Test
Goal-directed behavior is more sensitive to change in reward value than habitually controlled behavior, which is more resistant to such a change. We therefore examined different behaviors during the task, looking for indications for sensitivity to the change in reward value caused by the devaluation. We found that while behaviors directly related to reward consumption (sometimes referred to as Pavlovian), such as magazine checks and licking, maintained sensitivity, regardless of the duration the training, the instrumental lever press behavior lost sensitivity to reward devaluation with extended training. Only in the lever press behavior, mice in the ET group did not show sensitivity to the reward value and continued to press both levers, while mice in the ST group attenuated devalued reward related behavior (
Figure 2).
Each measured behavior was compared using a three-way ANOVA, to measure the changes as a function of condition (ST/ET), reward type (devalued/non-devalued), and time of measurement (last day of training/test). The last day of training is referred to as a baseline as it is the last day before the devaluation of one of the rewards, and as the test itself is under extinction conditions, no reward is present. The use of the last day of training as reference has led us to test each behavior by looking directly at the difference in behavior between the days, by subtracting for each mouse the behavior in the last day of training from the behavior in the test, and conducting a two-way ANOVA on the differences.
Both licking and checking behavior produced very similar results: the three-way ANOVA for the number of licks revealed no effect on licking other than the time (main effect for time F(1, 21) = 79.320,
p < 0.001), with no other main effects or interaction of time with any of the other factors (all
p > 0.1; n.s.;
Figure 2A). We also found inhomogeneous variance between the groups compared in the three-way ANOVA. We therefore conducted a
t-test with Welch–Satterthwaite degrees of freedom correction for the relevant planned comparison. This indeed revealed there were significantly more licks during the last day of training as compared to the test (t(45) = 10.64
p < 0.001). A mixed ANOVA looking at the difference between the number of licks in the test and the last day of training and comparing this difference between the type of reward (devalued/ non-devalued) and the training type (ST/ET) found no difference between the groups (ST/ET), no differences between reward types and no interaction (all
p > 0.1; n.s.;
Figure 2B), suggesting that there is no effect of the length of training or of devaluating the reward on this behavior. As the number of licks was immensely higher during training than in the test (due to the fact that the test was performed under extinction), comparing the number of licks during the test to the training could dilute any differences between devalued and non-devalued conditions during the day of the test. We therefore also conducted a two-way ANOVA between condition (ST/ET) and reward type (devalued/non-devalued) only during the test. We found a main effect for the reward, as there were significantly more licks to the non-devalued as compared to the devalued reward (F(1, 21) = 5.2951,
p < 0.05). Yet, we found no difference between the different groups (F(1, 21) = 0.25996,
p = 0.62 n.s.) or an interaction effect (F(1, 21) = 0.02746,
p = 0.87 n.s.). This suggests that licking behavior was directly affected by the value of the reward or the lack of it, and this direct effect could not be changed due to prolonged training.
The three-way ANOVA for the number of magazine checks resembled the results of the licks and revealed no effect on checks other than the time of the session. A main effect for time F(1, 21) = 139.06,
p < 0.001 was found with no other main effects or interaction of time with any of the other factors (all
p > 0.1; n.s.;
Figure 2C). This might suggest that the only factor affecting the checking behavior was the lack of reward, and not its value, which differentially changed only in one of the rewards. A mixed ANOVA looking at the difference between the number of checks in the test and the last day of training, and comparing this difference between the type of reward (devalued/ non-devalued) and the training type (ST/ET) found no difference between the groups (ST/ET), no differences between reward types and no interaction (all
p > 0.1; n.s.;
Figure 2B,D) suggesting that the length of training or the devaluation of the reward does not affect this behavior. Yet, in order to compare the lack of effects to the licking behavior, we also conducted a two-way ANOVA between condition (ST/ET) and reward type (devalued/non-devalued) only during the test. Here we found the same trend as the licking behavior. There was a trend for more checks to the non-devalued as compared to the devalued reward (F(1, 21)= 3.7526,
p = 0.066). Yet, we found no difference between the different groups (F(1, 21) = 0.06913,
p = 0.795 n.s.) or an interaction effect (F(1, 21) = 1.3952,
p = 0.25073 n.s.). Like the licking behavior, this suggests that checking behavior could be affected by the value of the reward and is strongly affected by lack of it, and this direct effect could not be changed due to prolonged training.
The instrumental lever press behavior, however, is a different story. A three-way ANOVA comparing the number of presses by condition (ST/ET), time (last day of training/test), and lever associated with the reward type (devalued/non-devalued) revealed a main effect for time (F(1, 21) = 12.075,
p < 0.01) with less presses on the test than in training. A main effect was also found for condition (F(1, 21) = 6.7639,
p < 0.05) with less presses altogether of the ET group compared to the ST group. We also found a significant three way interaction: F(1, 21) = 6.6862,
p < 0.05 (partial Eta
2 = 24.2%; observed power of 69.4%) (
Figure 2E). A Tukey HSD post hoc test revealed that a significant reduction in lever presses between the training and the test occurs only for ST mice (
p < 0.01), and only for the lever associated with the devalued reward as compared to the non-devalued reward associated lever (
p < 0.05). This suggests that the instrumental lever presses are affected specifically not from the lack of reward but from the value of the reward and only if the mice were shortly trained.
This effect was again evident when looking at the difference between the number of lever presses in the test as compared to the last day of training as the dependent variable (
Figure 2F). A mixed ANOVA comparing the differences in licking between the type of reward (devalued/non-devalued) and the training type (ST/ET) found no main effects of either factors (both
p > 0.1; n.s.). However, there was a significant interaction effect (F(1, 21) = 6.6862,
p < 0.05) suggesting that the groups are different in the way they press the lever for the different rewards. A Tukey HSD post hoc test revealed that ST mice pressed the devalued reward related lever significantly less as compared to both the lever associated to the non-devalued reward (
p < 0.05) and the number of presses on the devalued reward associated lever pressed by the ET group (
p < 0.05). These results indicate again that the instrumental lever press behavior is affected by the devaluation of the associated reward only in shortly trained mice.
As discussed in detail in the introduction, the different sensitivities of the different behaviors to devaluation after overtraining may lead to what seems like irrational behavior—excessive instrumental behavior without capitalizing on the effort and collecting the rewards. To test for this model behavior, we looked at the change in proportions between the instrumental lever press behavior and the more Pavlovian approach behavior. Proportion measure was computed by dividing the number of lever presses by the number of reward checks during the test for each mouse. Here indeed we found that ET mice worked much more (lever presses) and capitalize on their work much less (magazine checks) as their instrumental responses are maintained in the face of devaluation, yet their approach behavior maintains sensitivity to reward, and as such, declines (
Figure 3). A three-way ANOVA comparing the press/checks proportions by condition (ST/ET), time (last day of training/test) and reward type (devalued/non-devalued) revealed a main effect for time (F(1, 21) = 49.797,
p < 0.001),with no interaction of time with any of the other factors (all
p > 0.1; n.s.). We also found a two-way interaction between reward type and condition (F(1, 21) = 5.9405,
p < 0.05) and a three-way interaction: F(1, 21) = 10.722,
p < 0.01. A Tukey HSD post hoc test revealed an increase in press/check proportions for the devalued reward only in the test after extended training—as compared to the last day of training of ET group (
p < 0.005), and as compared to the response to the non-devalued reward in the ET group (
p < 0.05). We also see an increase in press/check proportions for the non-devalued reward only in the test after short training—as compared to the last day of training of ST group (
p < 0.01), but there is no difference from the proportion of their response to the devalued reward (
p > 0.1 n.s.;
Figure 3A). As Levene’s test showed that there is no homogeneity of the variances we conducted a
t-test with Welch–Satterthwaite degrees of freedom correction for the relevant planned comparisons compared to a Bonferroni corrected alpha of 0.016. This revealed that the effects of time within the ET group and this effect in the ST group remains significant (t(8) = 3.18
p < 0.01; t(20) = 4.02
p < 0.01), while there remains a trend of the within ET comparison between the rewards (t(9) = 2.3
p < 0.02). Together with the main effect of time this shows a main tendency to press more and check less in the absence of reward. The effect of the extended training group however points to an added effect of the reduction in reward value, as ET mice lever press response becomes less dependent on the goal value while the checking behavior is still affected.
This effect was even more apparent when looking at the difference of this proportion between the test and the last day of training as the dependent variable (
Figure 3B). A mixed ANOVA comparing the press/check proportions between the type of reward (devalued/non-devalued) and the training type (ST/ET) found no main effects of either factors (both
p > 0.1; n.s.). However, there was a significant interaction effect F(1, 21) = 10.722,
p < 0.005. Tukey HSD post hoc test showed that the press/check proportion increases only for the devalued reward and only for ET mice as compared to ST devalued reward (
p < 0.05). This suggests that only the cumulative value of both the devaluation and the lack of reward in the test increased the difference in proportions between the instrumental lever presses and the Pavlovian approach, but only if mice were trained long enough for the lever press behavior to become more habitual and less controlled by the reward value.
3.1.4. Revaluation
To test the persistence of devaluation effect on behavior, mice were trained for five extra days after the test session. The revaluation refers to the value of reward after the new value was acquired during devaluation. Both rewards were reinstated and no LiCl was injected, so mice could regain the value of reward. The reduced value of the reward persisted in both Pavlovian behaviors, as both licking and checking of the devalued reward were maintained at a lower level all through revaluation training for both ST and for ET mice. The instrumental lever press behavior, however, gives a strong indication to the different control mechanisms executing the behavior of each group. In the ST group, the difference between responses to the two rewards only grew with training, maintaining the difference in the value of the rewards acquired during devaluation as learning motivation. In the ET group, lever press behavior seemed to persist regardless of the reward value and was maintained through all revaluation days (
Figure 4).
Comparing the number of licks in ST mice during the five days of revaluation training using a repeated two-way ANOVA with days and reward (devalued/non-devalued) as factors revealed a main effect only to reward type (F(1, 14) = 18.362,
p < 0.001) with overall more licks to the non-devalued reward. No effect was found for days or any interaction effect (all
p > 0.1 n.s.;
Figure 4A). When looking at ET mice, the same two-way ANOVA reveals the same main effect only to reward (F(1, 7) = 9.8692,
p < 0.05), with overall more licks to the non-devalued reward. Again, no effect was found for days or any interaction effect (all
p > 0.1 n.s.;
Figure 4B).
Magazine checks reveals the same tendency as licks both in ST and in ET mice. Repeated two-way ANOVA with days and reward (devalued/non-devalued) as factors revealing a main effect only to reward type F(1, 14) = 14.973,
p < 0.005 with overall more magazine checks for the non-devalued reward with effect was found for days or any interaction effect (all
p > 0.1 n.s.;
Figure 4C). The same results was revealed in the ET mice where two-way ANOVA resulted in a main effect only to reward F(1, 7) = 13.629,
p < 0.01, with overall more licks to the non-devalued reward. Here, a trend effect was found for days F(4, 28) = 2.5417,
p = 0.06 with Tukey post hoc comparison showing that checking in the last day tends to differ from the first day (
p = 0.06), but without any interaction effect (
p > 0.1; n.s.) suggesting this increase is the same for both rewards (
Figure 4D).
Here, also, the instrumental lever press behavior was different than both licking and checking. Comparing the number of lever presses of the ST mice during the five days of revaluation training using a repeated two-way ANOVA with days and reward (devalued/non-devalued) as factors revealed a main effect of reward type (F(1, 14) = 9.6156,
p < 0.01), with overall more lever presses to the non-devalued reward. While no main effect was found for days (
p > 0.1 n.s.), there was a significant days * reward interaction effect (F(4, 56) = 4.3003,
p < 0.005), with Tukey Post hoc test showing that the difference in lever presses between devalued and non-devalued rewards started from the 4th day and continued to the last day (difference between reward types in 4th and 5th days all
p < 0.001). These results suggest that the negative value acquired during devaluation continued and even got stronger when re-exposed to the rewards (
Figure 4E). When looking at ET mice, the same two-way ANOVA shows no difference in either reward, days, or any interaction between them (all
p > 0.1; n.s.). Implying that due to extended training the lever pressing behavior is no longer goal-directed and the value of the reward either from re-exposure to the reward or from devaluation does not affect the number of lever presses after extensive training (
Figure 4F).