*3.2. Results*

3.2.1. Behavioral Analysis

Figure 10 shows learning curves for each category. Similar to the behavioral data from our pilot experiment (Figure 4), the participants acquired the visually distinct category first, and there were no performance di fferences between the two visually similar categories. Based on this, behavioral measures for the two visually distinct categories were averaged together to represent a single visually similar condition in our experiment in order to streamline comparison to the visually distinct category. A paired-samples *t*-test revealed that, on average, across training blocks, the subjects were significantly better at categorizing the visually distinct category (95%) than the visually similar categories (90%), *t*(43) = 5.45, *p* < 0.001. Figure 11 indicates that this di fference was driven by early runs, but by Run 5 there were virtually no performance di fferences across all categories.

**Figure 11.** Behavioral performance across training for each category. Participants acquired the visually distinct category first followed by the two visually similar categories. Additionally, there are no behavioral di fferences between the two visually similar categories. Performance on the two types of categories became equivalent in the second half of training.

Experiment 2 also included a questionnaire that explicitly asked participants to describe the strategies that they used to categorize each formation category. For categorizing visually similar categories, 91% of participants indicated that they used a counting rule when di fferentiating between the two visually similar categories (e.g., "I counted four players on the line of scrimmage for the first category, and three on for the second category."), 9% of participants relied on declarative memory for these two categories, and no participant reported reliance on similarity. For categorizing visually distinct formations, 21% of participants reported using an explicit counting rule (e.g., "I counted six people on the line of scrimmage"), 68% reported using declarative recall (e.g., "I memorized each formation individually"), and 11% reported using a perceptual similarity strategy (e.g., "There appeared to be a lot of people on the line of scrimmage for formations in this category, such that I did not need to count any players"). Thus, the self-reported strategies di ffered between the visually-similar vs. visually-distinct trials, although the distinction was less pronounced than in Experiment 1.

#### 3.2.2. Event-Related Potentials (ERPs) Selection Motivation and Analysis

All EEG data were analyzed using Philips Neuro Net Station 5 software. Classic ERP analysis was chosen, as it allows us to evaluate latency and amplitude di fferences as a function of categorization strategy. The distinct nature of individual ERPs enables us to attribute any observed di fferences as occurring within the well-studied circuitry that produces each ERP. In the past we utilized two ERPs to track learning-related changes in the brain: The Medial Frontal Negativity (MFN) and P300b (P3b)—for review see [13,39,40]. The MFN is a stimulus-locked medial frontal component with its primary sources in the Anterior Cingulate Cortex (ACC) [13]. The ACC plays a major role in error monitoring and attention during reward-based learning, which makes it an ideal component for indexing e ffortful control seen in the early stage of category learning [41]. On the other hand, the P300 is elicited under an array of conditions and there is now a well-defined family of di fferent P300 components. Most relevant to learning, the amplitude of the Late Positive Component (referred to as the P3b) mirrors accuracy improvements during categorization tasks [39,40]. The P3b is hypothesized to reflect a constant monitoring and updating of the context under which learning occurred. As context is formed through learning, the maintenance and updating of the context helps to guide a person toward selecting an action quickly and e fficiently. Although the sources of the P3b are still being debated, intracranial EEG and animal studies sugges<sup>t</sup> multiple sources, including Posterior Cingulate Cortex (PCC), medial temporal lobe, and superior temporal sulcus—structures that are integral to the late learning stage [42–49]. In the current experiment, we were interested in examining amplitude and latency di fferences in these two components as a function of the categorization strategy used on a given trial. In theory, the strategies should di ffer in their reliance on frontal control areas and posterior corticolimbic structures to complete the task, as seen in Experiment 1. Using ERPs allows for us to interpret latency di fferences between trial types (distinct vs similar) as reflecting the time-course under which the categorization strategies (and their underlying memory systems) are engaged.

The Lateral Inferior Anterior Negativity (LIAN) is a third component that could potentially dissociate between the two categorization strategies. The LIAN is a lesser-known bilateral component that has shown clear dissociations between the recognition of spatial targets and digit targets in a visuomotor association task [39]. Specifically, the amplitude of the right LIAN is anticorrelated with acquiring the ability to recognize spatial configurations and it shows no changes when targets invoke the phonological loop. However, the amplitude of the left LIAN is positively correlated with learning to recognize phonological targets and it is insensitive to acquiring an ability to perform spatial analyses. The Inferior Frontal Gyrus (IFG) is inferred to be the primary source of these components, but it is worth mentioning that the LIAN is rarely discussed in the literature, where it does not receive any mention outside of its role in visuomotor learning. This component was selected, as it might show a dissociation between the two categorization strategies, since they inherently di ffer in how they engage the phonological loop.

Please see Section 3.1.5 for a review of how all of the signals were pre-processed. For the MFN analysis, a cluster of 12 electrodes that best represent the medial frontal distribution of the component were chosen (see pink electrodes, Figure 12). Consistent with how we have quantified the MFN in the past, an adaptive mean amplitude corresponding to 20 ms before and 20 ms after the maximum negative peak amplitude in a window that extends from approximately 180–300 ms after stimulus onset was computed for the MFN electrode cluster [13,39]. The MFN was referenced to the preceding positive peak (P200) around 150–200 ms after stimulus onset. This method was applied for the post-learning trials for all three formation categories. The trials in the visually distinct category were averaged together to form a single ERP for the similarity-based condition. After analyzing both visually similar categories individually, we determined that there were no amplitude or latency di fferences in the MFN between these two categories, consistent with the idea that both would require the engagemen<sup>t</sup> of explicit, rule-based categorization. In light of this, trials in the two visually similar categories were averaged together to form a single ERP for the visually similar condition. A paired-samples *t*-test was run to evaluate the di fferences in MFN amplitude for the visually distinct and visually similar categories. The test revealed a marginally significant effect, such that the MFN was the largest for the visually distinct category (*M* = −2.31 μV) as compared to the visually similar categories (*M* = −2.07 μV), *t*(43) = −1.98, *p* = 0.054 (Figure 13).

**Figure 12.** Electrode montages used for the Medial Frontal Negativity (MFN), P300b (P3b), and Lateral Inferior Anterior Negativity Event-Related Potential (LIAN ERP) components. Orange and yellow: Electrodes used for the LIAN analysis. Pink: Electrode cluster used to quantify the MFN. Blue: Electrodes used to quantify the P3b.

**Figure 13.** Top: A voltage map displays the voltage across the scalp for the similar and distinct conditions at the peak of the MFN (asterisk in bottom waveform image). A stronger negative voltage is seen over the medial frontal areas for the visually distinct condition. Bottom: Representative waveform (i.e., a single channel over the middle of the negative scalp potential) showing the shape of the MFN for both conditions. This waveform was derived from the grand-average of all the analyzed subjects. The amplitude of the MFN is higher (more negative) for the visually distinct condition.

For the P3b analysis, a set of 17 channels that corresponded to the posterior-parietal distribution of the component were used (see blue electrodes, Figure 12). An adaptive mean amplitude corresponding to 22 ms before and after the peak amplitude window extending from approximately 450–950 ms after stimulus onset was computed for the group of electrodes to quantify the component. This method was applied for the post-learning trials for all three formation categories and is consistent with how we have quantified the P3b in previous experiments [40]. Separate ERPs were computed for the visually similar and distinct categories similar to the method described for the MFN after establishing that there were no differences in amplitude between visually similar categories. A paired samples *t*-test revealed that the amplitude of the P3b for the distinct category (6.02 μV) was significantly larger than the similar categories (5.34 μV), *t*(43) = 4.17, *p* < 0.001. Figure 14 displays this effect.

**Figure 14.** Top: A voltage map displays the voltage across the scalp for the visually similar and distinct conditions at the peak of the P3b (asterisk in bottom waveform image). A stronger positive voltage is seen over the posterior parietal areas for the distinct condition. Bottom: Representative waveform showing the shape of the P3b for both conditions. This waveform was derived from the grand-average of all analyzed subjects. The amplitude of the P3b is higher (more positive) for the visually distinct condition.

The LIAN was quantified by utilizing a cluster of 22 channels in the left or right frontoparietal regions (see orange and yellow electrodes in Figure 12, respectively). An adaptive mean amplitude of these clusters corresponding to 22 ms before and after the peak negative amplitude in a window that extended from 450–950 ms (the same window as the P3b) was used to quantify the component. This method was applied for all post-learning trials for all three categories in each subject. Similar to the P3b and MFN, separate ERPs were computed for the visually similar and distinct categories for both the left and right LIAN after establishing no differences between the visually similar categories. A paired-samples *t*-test showed that the amplitude of the left LIAN was largest for the distinct category (−7.06 μV) as compared to the amplitude of the visually similar categories (−5.54 μV), *t*(43) = −2.98, *p* = 0.004 (Figure 15). However, no significant amplitude difference for the right LIAN were found between the similar categories (−3.55 μV) and distinct category (−2.92 μV), *t*(43) = 1.23, *p* = 0.23.

**Figure 15.** Top: Voltage maps display the voltage across the scalp for the similar and distinct conditions at the peak of the LIAN on the left and right sides (asterisks in bottom waveform images). A stronger negative voltage is seen over the left frontal areas for the visually distinct condition and a stronger negative voltage is seen over the right frontal areas for the visually similar condition. Bottom: Representative waveforms showing the shape of the LIAN for both conditions in the left and right hemispheres. The amplitude of the left LIAN is higher (more negative) for the distinct condition, whereas the right LIAN is higher (more negative) for the similar condition.

#### 3.2.3. EEG Machine Learning Analysis

In addition to traditional ERP analysis, we chose to utilize machine learning, as it provides a more data-driven approach to measuring functional differences. We were interested in tracking the earliest timepoint at which brain responses become distinguishable for visually similar vs visually distinct categories. Group-clusters were used to evaluate the general location of these early temporal dissociations. This novel approach has the advantage of utilizing information in the entire pattern of amplitudes across the whole brain, which can potentially increase the sensitivity to subtle differences or engagemen<sup>t</sup> of different networks that may include overlapping regions.

For every subject, post-learning trials were chunked into individual segments extending 200 ms before and 1000 ms after stimulus onset for each category. Segments containing ocular or movement artifacts were rejected from analysis. Each segmen<sup>t</sup> was baseline corrected while using a 200 ms pre-stimulus baseline before averaging the segments together to form one averaged waveform for each category of stimuli. Waveforms for the two visually similar categories were averaged together to be compared against the distinct category before re-referencing to an average reference. The waveforms were then broken down into their individual samples, which, at a sampling rate of 250 samples/second, resulted in 300 total samples per waveform (each sample representing 4 ms of recording).

We averaged together the raw voltages of electrodes within 10 regions in order to reduce the number of predicting elements in this analysis: left frontal, right frontal, medial prefrontal, medial frontal, posterior parietal, left temporoparietal, right temporoparietal, left occipital, right occipital, and medial occipital (Figure 16). This process was done for each individual sample for both categories. We then averaged together every five consecutive samples together, resulting in 60 timepoints for each waveform with every timepoint representing 20 ms of data. The first 10 timepoints were used in the baseline correction and, thus, not included in the analysis. In the end, this gave us two matrices (one

for visually similar and one for visually distinct) for each subject with dimensions 50 (timepoints) × 10 (electrode groups).

**Figure 16.** Electrode montages used to define regions during machine-learning analysis. Orange = left frontal, yellow = right frontal, green = medial prefrontal, pink = medial frontal, blue = posterior parietal, cyan = left temporoparietal, red = right temporoparietal, brown = left occipital, purple = medial occipital, and black = right occipital.

For each timepoint, a linear Support Vector Machine (SVM) classifier, as implemented in Matlab, was trained to classify patterns of EEG voltages associated with visually similar vs. visually distinct categories across subjects. The patterns of voltages across all 10 electrode groups associated with each condition for each subject served as the patterns to be classified. Leave-one-subject-out cross validation was carried out, such that patterns from 43 out of the 44 subjects were used to train the classifier, and the subject that was left out of training was used as the test subject. This type of training and test format was iteratively performed until all subjects were used as a test subject. For each iteration and timepoint, the classifier provided an estimate of how likely each of the two test patterns from the left-out subject (one pattern for visually similar trials and one for visually distinct trials) represented the visually similar category. Because there were two categories (distinct vs similar), the classifier-estimated probability that a pattern represents the visually distinct category was always 1 minus visually similar. The test pattern with greater visually similar evidence was labeled as the classifier's guess for which pattern represents the visually similar category. The other test pattern was labeled as the visually distinct guess. When the classifier's guess matched the actual condition, the classification was considered correct for the given test participant and timepoint. The classification accuracies from both pairwise classifications (visually similar 1 vs visually distinct, visually similar 2 vs visually distinct) were averaged together. This was done to provide an overall estimate of how well the classifier could distinguish between each of the two visually similar categories vs. the visually distinct category.

The classification accuracy for each timepoint was averaged across iterations and a one-sample *t*-test was performed against a theoretical chance mean (50%, as we performed pairwise classifications). The cross-validated classification accuracy for each timepoint is chronologically plotted in Figure 17, and timepoints that had a classification accuracy significantly above chance at *p* < 0.05 (uncorrected) are denoted by a blue diamond along the X axis. From this figure, the earliest timepoints at which the classifier was able to reliable differentiate between the two categories was between 260 and 320 ms, which coincides with the onset and peak of the MFN. Another extended period reliably above chance was between 440 and 700 ms, which corresponded to the peak and onset of the LIAN and P3b.

**Figure 17.** Whole-brain classification accuracy over time on an experimental trial. Blue diamonds along the X-axis represent timepoints where classification accuracy is significantly above chance (*p* < 0.05). The earliest string of above-chance classification accuracies is observable between 200 and 300 ms after stimulus onset, followed by another group between 430–700 ms. A late string of reliable classification occurs around 890–1000 ms.

The same SVM classification was run again using only the voltages in each region individually to determine whether any one particular region was driving the classification accuracy at each timepoint. The overall classification within each region indicated that the medial prefrontal, left frontal, and posterior parietal regions show the earliest reliable (and strongest) classification accuracy amongs<sup>t</sup> all regions, with a maximum classification accuracy of 82% (Figure 18). Within these regions, reliable differentiation between categories occurs around 250 ms and remains stable until around 740 ms. The classification accuracy peaked earlier in the posterior parietal region compared to the medial prefrontal and left frontal regions, even though we can differentiate between the two categories with reliable accuracy using any of these three regions within the entire 500 ms window.

**Figure 18.** Region-based classification accuracy over time and correlated with behavioral performance. Top: Classification accuracy for the left-frontal electrode montage. Classification accuracy peaks between 400 and 700 ms. During this timeslot, classification is positively correlated with performance. Middle: Classification accuracy for the medial frontal electrode montage. Accuracy peaks between 600 and 750 ms after stimulus onset and does not correlate with behavior in any way. Bottom: Classification accuracy for the posterior parietal electrode montage. Accuracy peaks the earliest in this region, occurring between 220–500 ms. Interestingly, the classification accuracy is negatively correlated with behavioral performance within this window.

We were also interested in whether different neural strategies employed for the two types of trials—as evidenced by better SVM differentiation between neural patterns that are associated with each trial type—are beneficial to performance. Thus, we ran an exploratory analysis using a Pearson's correlation between the SVM classification accuracy and the behavioral performance on the categorization task of each subject. In Figure 18, the trajectory lines are color-coded red or cyan corresponding to timepoints where the SVM classification accuracy was significantly correlated with behavioral performance below a threshold of *p* = 0.05. The timepoints that are highlighted in red indicate that the SVM classification accuracy was positively correlated with behavioral performance, and those in cyan were negatively correlated with performance. The classification accuracy of the medial prefrontal region did not significantly predict behavioral outcome at virtually any timepoint. In contrast, the left frontal region, which is the location of the left LIAN component, was positively correlated with behavior throughout its classification peak. One interpretation of this finding is that the ability to flexibly employ different strategies best matching the current demands may optimize performance overall. Unexpectedly, the classification accuracy of the posterior parietal region was negatively correlated with behavior in several timepoints between 220 and 500 ms. One interpretation of this finding is that the shift away from verbalizable rule-based strategies itself requires executive resources and, thus, excessively differential allocation of resources at this topology and timepoints might make it difficult to continue learning beyond explicit rule-application [50]. Of all the regions, the right frontal area (the location of the right LIAN) was responsible for the very latest classification accuracy peak, occurring between 800 and 1000 ms. The classification accuracy in this region did not significantly correlate with behavior within this window. The three occipital areas along with the two parietal areas failed to demonstrate consistent windows of reliable classification accuracy.
