Next Article in Journal
Design of Manual Handling Carts: A Novel Approach Combining Corrective Forces and Modelling to Prevent Injuries
Previous Article in Journal
A Risk-Informed Design Framework for Functional Safety System Design of Human–Robot Collaboration Applications
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Levels of Automation for a Computer-Based Procedure for Simulated Nuclear Power Plant Operation: Impacts on Workload and Trust

1
Institute for Simulation and Training, University of Central Florida, 3100 Technology Pkwy, Orlando, FL 32826, USA
2
Department of Psychology, George Mason University, University Drive, Fairfax, VA 22030, USA
3
U.S. Nuclear Regulatory Commission, Rockville, MD 20852, USA
*
Author to whom correspondence should be addressed.
Safety 2025, 11(1), 22; https://doi.org/10.3390/safety11010022
Submission received: 6 December 2024 / Revised: 1 February 2025 / Accepted: 8 February 2025 / Published: 2 March 2025

Abstract

:
Nuclear power plants increasingly utilize digitalized systems, including computer-based procedures (CBPs) and automation. These novel technologies require human factors’ evaluation to ensure safety. Potentially, automation contributes to safety by reducing workload, but automation may also induce a loss of situation awareness and trust miscalibration. The current study investigated workload during a simulated nuclear power plant (NPP) emergency operation procedure (EOP) executed using a CBP supported by automation. Two levels of automation (LOA) were compared within subjects: management-by-consent (lower LOA) and management-by-exception (higher LOA). Subjective workload and trust were assessed, together with objective psychophysiological and performance-based workload measures. LOA effects varied across the different workload measures. The hypothesis that workload would be reduced at the higher LOA was confirmed for a behavioral measure (secondary task response time). However, other metrics, including instantaneous self-assessment (ISA) and heart rate variability (HRV), showed increased workload at the higher LOA. Different LOAs may produce differing operator strategies that require multivariate workload assessment to evaluate. Effect sizes for the impact of LOA on workload were indexed by Cohen’s d. Several of these effect sizes were in the 0.4–0.6 range, indicating effects of medium magnitude. In addition, subjective workload data were compared with those from a previous study that simulated conventional NPP operations. As anticipated, workload tended to be lower with the automated procedure. The study suggests future directions for human factors research on plant modernization.

1. Introduction

The nuclear power industry is increasingly focused on plant modernization and novel reactor designs, including small modular reactors (SMRs) and micro-reactors [1,2]. Modernization efforts include both the replacement of analogue instrumentation and controls (I&C) with digital equivalents and more advanced digitalization efforts that provide novel functionalities [3]. These include system automation, novel displays and visualizations, intelligent alarm systems, and advanced diagnostic tools to aid in fault detection, diagnosis, and troubleshooting [2]. Modernization introduces novel challenges for plant licensing and review agencies, such as the Nuclear Regulatory Commission (NRC) in the USA. Operators working with novel technology may be vulnerable to different types of errors than those employed in conventional plants. For example, working with automation can lead to human factors issues, such as over- and under-trust, novel error modes, skill degradation, and loss of situation awareness [4,5,6,7]. In this article, we report a simulation study of operator response to working with a modernized nuclear power plant (NPP) interface to execute an emergency procedure. We investigated the impact of the level of automation (LOA) on workload and trust while implementing a computer-based procedure (CBP).
Next, we review previous simulation studies of NPP operation that indicate how CBPs and automation may affect operator workload and performance. Trust in plant automation is a novel issue for research in this area. We also report an experimental study to investigate how the LOA for CBPs impacts operator workload and trust during performance of different component tasks during a simulated emergency procedure. Discussion of these data addresses the benefits and costs of automation, the need for further investigation of trust, and study limitations.

1.1. Computer-Based Procedures and Automation

CBPs can make a significant contribution to plant modernization [8]. They replace paper-based procedures (PBPs) that specify a series of steps that need to be taken during normal and abnormal operations, including logically branching paths. Like PBPs, CBPs assist operator response planning during emergency operations, but they can also offer additional functionality, including monitoring of the plant’s state and parameters, supporting situation assessment, and executing certain elements of procedures. CBPs are expected to improve operator performance by providing structured, consistent, logical procedures and flexible, context-sensitive supporting information [8,9,10]. Experimental studies have indeed provided initial evidence that CBPs reduce the error rate during procedure execution, especially when procedures are complex [8,11].
CPBs can be supported by automation. Hall et al. (2023) [12] distinguished those with no automation (Type 1), those providing additional information and access links to displays and soft controls (Type 2), and those with additional capability to automatically carry out sequences in the procedure (Type 3). However, introduction of automation can have both benefits and costs for human performance, as the operator’s role changes from actively controlling reactor processes to monitoring the automation via instrumentation. On the positive side, well-designed automation reduces workload and the likelihood of operator error, while enhancing situation awareness [5,13,14]. Automation may also be effective in preventing slips and lapses in routine NPP operations [15].
Potential costs of automation include operator disengagement, neglect of possible automation failures (complacency), and reduced vigilance [6,16]. Operators may experience difficulty in calibrating trust to the actual performance of the automation [5]. These issues may be especially prevalent when the automation is highly reliable, so that automation failures are unexpected and catch operators off guard.
The level of automation (LOA) defines the extent to which tasks are allocated between the human and the machine automation [17,18]. From a design perspective, it is important to identify the LOA that maximizes the benefit-to-cost ratios for performance efficiency and safety of operations. Human factors researchers have proposed various numeric scales to express the LOA [18,19]. Table 1, based on NUREG 0700, Revision 3 [20], defines five LOAs for NPP applications.
Choosing the optimal LOA is essential in distributing tasks between operators and automation, setting workload levels to avoid under- or over-load, promoting appropriate levels of trust in automation, and avoiding human errors caused by complacency or incorrect usage of the system [21]. Intermediate LOAs may be preferable to high LOAs because they maintain operator engagement with the task and mitigate vigilance and “out of the loop” effects [19]. Indeed, a higher LOA can produce performance benefits and workload mitigation but can also lead to loss of SA when automation fails [19]. One study [12] varied the level of automated support provided by CBPs for plant operation scenarios. Consistent with the benefits of an intermediate LOA, increasing the level of automation reduced workload, but situation awareness was highest with the intermediate level (Type 2) CBP. Higher LOAs can also contribute to automation complacency. Especially in multi-tasking conditions, operators may become over-reliant on automation and fail to monitor for automation failures effectively [6].

1.2. Workload Issues for Automated Systems in NPP Operations

In the main control room (MCR) of traditional plants, cognitive overload is a recognized issue because operators may work with as many as 8000 I&C [22]. Operators are thus vulnerable to elevated cognitive workload, which, in turn, increases the risk of human error. Some tasks performed by operators may be especially vulnerable to overload; for example, vigilant monitoring of displays can be taxing [23]. The complexity of NPP operations imposes a further cognitive burden on operators during emergency situations when issues need to be diagnosed rapidly [15,24]. Operator perceptions of overload also drive stress responses, further elevating the likelihood of human error [25].
Workload assessment is an important element of the safety review and licensing activities of the NRC, as defined in NUREG-0711 [26]. The assessment of workload contributes to prospective human reliability analysis (HRA) for the nuclear industry (NUREG/CR-1278) [27] through identifying operator task elements that are liable to impose excessive cognitive demands and thus raise error probabilities and threats to safety. Workload issues in the MCR have been investigated at the NRC Human Performance Test Facility (HPTF) using a simulator designed to be suitable for both novice populations and trained operators [28]. This research used both subjective and psychophysiological indices of workload. Reviews of HPTF research [29,30,31] supported several conclusions about workload factors in NPP operations. First, there were robust differences between different tasks performed by operators. Typically, detection of plant state changes represented in the instrumentation was the task element that produced the highest workload levels, compared with checking I&C to verify its current state or level and implementing responses, such as opening or closing a valve. The workload demands of detection are consistent with the vigilance literature [32]. They indicate a challenge for automated systems that require operators to monitor the system state and detect automation failures or errors. Second, there was partial but incomplete convergence between different workload indices. The sensitivity of individual indices may vary with factors, including interface configuration and operator experience. Multivariate workload assessment thus provides a more comprehensive picture of operator response than any single index does. Third, while relationships between workload and vulnerability to error can be complex, workload assessment remains important for human reliability analysis (HRA) [33] and safety in the nuclear industry [34].
Automation has mixed impacts on workload and performance in the nuclear domain. Higher LOAs may mitigate workload at the cost of impairments in situation awareness [35]. While automation can relieve the cognitive burden on operators, it can also add workload due to additional interface management tasks, complexity of automated systems, tracking of the operational mode of the automation, and correctly diagnosing errors made by the automation [4]. Managing automation may add to workload when unforeseen operational states arise that require operators to understand how the automation is functioning [36]. Some work in the area has addressed automation of alarm systems. Studies evaluated participant responses to a feedwater pump alarm in a simulation of an advanced NPP [37,38]. They compared manual reset of the alarm with automatic reset. The authors concluded that auto-reset elicited lower initial effort and improved operating performance, especially in the multiple tasks condition. However, manual reset may be preferred by experienced operators for providing more information than automated reset on how to handle the alarm. Another study [39] found that using a higher degree of automation reduced workload during a reactor shutdown task but not during a reset alarm task. In the former task, mean workload on a 100-point scale was 64 with low automation and 49 with high automation.
Two experiments addressed the workload and performance impacts of introducing CBPs. One study compared the impacts of CBPs and PBPs on participant response to a simulated steam generator tube rupture [40]. It was found that computerization of procedures lowered workload but also reduced situation awareness (similar to findings from other domains [19]). The mental demands rating was around 8.1 for the PBP and 7.0 for the CBP. A further study utilized a simulated reactor microworld capable of representing both everyday operations and emergency situations [12]. Startup and loss of feedwater scenarios were examined. Three types of CBP, as defined above in Section 1.1, were compared. Results showed that more advanced automation of CBPs reduced subjective workload. Mean levels of workload on a 1–10 scale were 5.78 (Type 1), 5.08 (Type 2), and 4.67 (Type 3). Workload was highest with no automation (Type 1) and lowest when the CBP provided partial automation of control (Type 3). Some performance benefits of automation were also seen, although they differed across scenarios.

1.3. Measurement of Trust in Automation

Introducing automation requires systems and interfaces to be evaluated for trust as well as workload [41]. Evaluating trust in automation is not always straightforward—operators engage with automated systems based on their prior experiences with similar systems and technology in general. To ensure optimal use of an automated system operator, trust must be calibrated to the system’s actual reliability. Mis-calibrated trust can introduce risks of misuse, disuse, and abuse of automation [16]. Automation in the nuclear industry is anticipated to be highly reliable, so operators should be trusting while remaining vigilant to occasional automation errors. In the nuclear domain, operators may have difficulty in accurately evaluating the trustworthiness of automation because introduction of novel systems increases operator uncertainty over system functioning [36,42]. Indeed, one early study found that operator trust was inversely related to the actual efficiency of the automation [43]. Given the limited number of studies conducted thus far, there is a need for more research on operator trust and its measurement.
Both trait and state measures of trust have been used in human factors research [44,45]. Trait measures assess the person’s general willingness or disposition to trust automated systems, and various scales have been developed [28]. Two trait trust scales suitable for the nuclear domain are the Human Interaction and Trust (HIT) [46] and Perfect Automation Schema (PAS) [47] scales. The HIT assesses intentions to rely on the automation and so it is expected to be predictive of reliance behavior. The Perfect Automation Schema is the all-or-nothing attitude that automation either works or it does not [47]. Automated systems in NPPs are anticipated to be highly reliable. Individuals with high PAS scores may then be at risk of assuming that the system is perfectly reliable, leading to possible complacency. There is also a PAS scale for overall expectations of system performance.
State measures are administered following task performance to assess level of trust during a single interaction with the automation. Very trusting operators might be over-reliant on the automation and at risk of complacency and failure to monitor and check the automation. Conversely, low-trust operators are inclined to neglect the automation and perform tasks manually, leading to unnecessary workload. Various scales have been used to measure immediate states of trust [44]. The Checklist of Trust between People and Automation (CTPA) [48] has been validated for assessment of trust in process control systems and appears to be well suited to the nuclear domain [49].

1.4. The Present Study: Aims and Hypotheses

The aim of this study was to investigate the effects of the LOA and task type on workload and trust within a simulated emergency operating procedure (EOP) performed by a single participant. It utilized a modified version of the GSE generic pressurized water reactor (GPWR) simulator used in previous HPTF studies [29,31]. The participant executed two CBPs presented on two adjacent “storyboards”, switching between CBPs, as directed by the procedures. The CBPs comprised a sequence of steps requiring either checking, detection, or response implementation. Checking required the participant to verify that the I&C was in the correct state. Detection required monitoring of an instrument to detect change in a specified parameter. Response implementation required locating and manipulating a control, as directed by the procedure. Each step was implemented by the automation, after which the participant could correct automation errors using the response permitted by the system LOA. Participants also performed a concurrent secondary task that required them to monitor 12 gauges for going out of range, to provide an additional, performance-based workload measure.
In manipulating LOA, we focused on intermediate LOAs because high levels of automation may lead to out-of-the-loop issues and loss of SA [19,40]. Specifically, we compared management-by-consent and management-by-exception LOAs, i.e., levels 3 and 4 in Table 1. In management-by-consent, the lower of the two levels, the participant must confirm or over-ride each statement made by the automation. In management-by-exception, the participant had a 30 s window in which to override automation, should they detect an error. Thus, the automation can proceed to the next step without explicit approval from the participant. We tested the following hypotheses:
Workload. We expected that, as in previous HPTF studies [29,31], the detection task would impose higher workload than checking and response implementation tasks across multiple physiological and subjective workload metrics. The present study also included a secondary task, and it was predicted that the workload effect for detection would extend to the secondary task as well. Task type effects were tested using the Multiple Resource Questionnaire (MRQ) subscales on an exploratory basis due to the change from the conventional procedures used in previous HPTF studies to CBPs. It was expected that the pattern of demands elicited by task performance might differ from that observed previously. We also hypothesized that the higher LOA (management-by-exception) would elicit lower workload across multiple metrics, relative to management-by-consent. To the extent that the higher LOA is most beneficial for the most demanding task (detection), we also anticipated interactive effects of the LOA and task type.
Trust. Given the dearth of relevant previous studies, investigation of subjective trust was exploratory. However, since the automation was highly reliable, i.e., typically correct, we considered that the higher LOA might induce a sense of complacency and hence elevate trust. We also aimed to investigate individual differences in trust to determine whether some individuals are predisposed to assign trust sub-optimally, which would lead to misuse of automation. We anticipated that the two trait measures (HIT and PAS) would be positively correlated with state trust (CPTA).
Comparison with previous HPTF studies. Previous studies using the HPTF simulator established benchmarks for workload during different operational tasks [29,31]. We analyzed workload differences between one of these previous studies, representing traditional NPP operation, and the current configuration of the simulation using the subjective measures typically used for comparing workload levels across qualitatively different task environments [25]. We anticipated workload levels would be lower for the automated version of the simulation than for the conventional mode of operation previously utilized.

2. Method

2.1. Participants

Forty-five psychology undergraduate students aged 18–50 (M = 20.93, SD = 5.78) received course credit for participation. There were 26 males, 15 females, and 4 of unspecified gender. All participants reported normal or corrected-to-normal vision, no color vision deficiency or history of neurological disorders, and no prior NPP experience.

2.2. Experimental Design

A 2 (LOA: management-by-consent and management-by-exception) × 3 (task type: checking, detection, and response implementation) within-subjects design was adopted. Participants completed two experimental scenarios in management-by-consent and management-by-exception LOA settings, respectively, in a counterbalanced order.

2.3. Apparatus

2.3.1. NPP Simulator

The experiment was conducted using the NPP MCR simulator housed in the lab space at the University of Central Florida (see Figure 1). It is a modification of the GSE GPWR digital full-scope simulator that provides controlled and repeatable NPP control room scenarios for participants. The GPWR simulator [50] represents a generic 3-loop PWR (pressurized water reactor) with the capability to run the full range of power operations. The human systems integration (HSI) uses eight LCD panels to mimic control room hard panels. Software includes a graphics tool, an instructor station, and a real-time executive program, with a system update time of at least two times per second [51]. The GPWR is based on a realistic dynamic model of reactor physics. However, in an earlier study [51], pilot testing revealed that use of the dynamic model did not provide sufficient experimental control over operational scenarios. Thus, the simulator software was modified so that I&C states on the panel had a limited and pre-defined range of behaviors that was controlled by scripts so that participants could be presented with consistent, reproducible scenarios [29]. In addition, some of the GPWR I&C were eliminated or modified to simplify the interface for novice participants by reducing visual complexity and short-term memory load [52]. Further details of the simulation methodology are available in previous HPTF reports [29,51,52].
For the present study, participants worked with four panels containing digitized analog I&C. Each panel consisted of four 27-inch touchscreen monitors (2560 × 1440 pixels in resolution per screen) arranged two high by two wide. The CBP was implemented via a desktop computer (6.4 GT/s, Intel XeonTM 5600 series processor). A separate monitor provided participants with instructions and prompts regarding the ongoing tasks, including directions to interact with I&C contained within the panels.

2.3.2. Scenario for the EOP

The experimental scenario was developed based on a generic version for a “Loss of All Alternating Current (AC) Power (ECA-0.0)” emergency operating procedure (EOP) that was modified for experimental use. There was a total of 25 task steps, including 17 checking tasks, 4 detection tasks, and 4 response implementation tasks. The checking task type required participants to work with the automation to complete a one-time inspection of an I&C to verify that it was in the state called for by the EOP. The detection task type required participants to continuously monitor a specific instrument parameter for a change in value. The parameter was monitored by the automation and participants were prompted with the evaluation from the automation when the change was detected. The response implementation task type required participants to locate a control with the navigation assistance from the automation and subsequently manipulate the control in the required direction (i.e., open or shut). Alternatively, participants could choose to wait for the automation to complete the action for them.
The CPB was presented in a two-column layout, as shown in Figure 2. Instructions associated with each workstation were shown in its dedicated column. With management-by-consent (lower LOA), explicit endorsement or over-ride of the automation is needed. With management-by-exception (higher LOA), the automation was implemented unless the operator chose to over-ride within a set 30 s time window. All task steps were performed by the automation. The automation reliability was described as highly reliable but not perfect to participants during training. In the total of twenty-five task steps, two of the checking steps were given incorrect automation outcomes, i.e., a reliability of 92%. Detection and response implementation tasks had too few steps to include any incorrect automation.

2.3.3. Secondary Task

Participants were also instructed to monitor a group of twelve gauges on the panel throughout the scenario as a secondary task. When any gauge fell below 20%, participants were instructed to tap the label of the gauge to acknowledge. This happened 10 times for each scenario.

2.4. Subjective Measures

2.4.1. Perfect Automation Schema (PAS)

The PAS [47] consists of 8 items relating to people’s trust in automated systems, with two subscales: high expectations and all-or-none belief. Each item is scored on a 5-point Likert scale from “strongly disagree” to “strongly agree”.

2.4.2. Human Interaction and Trust (HIT)

The HIT trust scale (modified from [46]) comprises 10 items asking about intentions to trust automation. Items are answered on a 7-point scale, with 1 being “not at all true” and 7 being “very true”.

2.4.3. NASA Task Load Index (NASA-TLX)

The NASA-TLX [53] consists of six 0–100 rating scales: mental demand, physical demand, temporal demand, performance, effort, and frustration. Global workload was calculated as the mean rating, with scoring reversed for performance.

2.4.4. Multiple Resource Questionnaire (MRQ)

The MRQ [54] measures specific sources of demand based on multiple resource theory. The following 14 of 17 scales were included for the present study: auditory emotional process, auditory linguistic process, manual process, short-term memory process, spatial attentive process, spatial categorical process, spatial concentrative process, spatial emergent process, spatial positional process, spatial quantitative process, visual lexical process, visual phonetic process, visual temporal process, and vocal process. Each demand was rated on a scale ranging from 0 (no usage) to 100 (extreme usage).

2.4.5. Instantaneous Self-Assessment (ISA)

The ISA [55] assesses concurrent workload during task performance. Participants were asked to verbally rate their workload during each of the three task types (checking, detection, and response implementation) using a 5-point Likert scale, ranging from “1 = very low” to “5 = very high”.

2.4.6. Checklist of Trust Between People and Automation (CTPA)

The CTPA [48] is composed of 12 items, which are rated on a 7-point Likert scale from “not at all” to “totally agree” (see Appendix A). Participants rated their trust in the automation following performance.

2.5. Physiological Measures

2.5.1. Electroencephalogram (EEG)

The Advanced Brain Monitoring B-Alert X10 system was employed to assess brain activity in three spectral bands, using nine channels of EEG and one channel of ECG. Electrodes were arranged following the international standard 10–20 System [56], with electrodes placed at the following specific sites: Fz, F3, F4, Cz, C3, C4, Pz, P3, and P4, and two reference electrodes placed bilaterally at the mastoid bones. Data were recorded using a sampling rate of 256 Hz. Power spectral density (PSD) analysis techniques were used to analyze activity in three standard bandwidths: theta (4–8 Hz), alpha (9–13 Hz), and beta (14–30 Hz) [57]. Typically, cognitively demanding tasks elicit increases in beta and frontal theta while suppressing alpha [57]. Data were averaged across lateral sites to compare left and right hemispheres.

2.5.2. Electrocardiogram (ECG)

The B-Alert X10 system sampled ECG at 256 Hz. Single-lead electrodes were placed on the center of the right clavicle and one on the lowest left rib. Heart rate (HR) was computed using peak cardiac activity to measure the interval from each beat per second. So and Chan’s (1997) algorithm [58] was used to identify the QRS signal and to calculate the inter-beat interval (IBI) and heart rate variability (HRV) [59]. Mental workload tends to increase HR and decrease HRV, which is typically the preferred metric because HR is also sensitive to physical activity [60].

2.5.3. Transcranial Doppler (TCD)

The Spencer Technologies ST3 Digital Transcranial Doppler, model PMD150, was used to monitor cerebral blood flow velocity (CBFV) of the medial cerebral artery (MCA) in the left and right hemisphere through a high pulse repetition frequency. The Marc 600 head frame set was used to hold the TCD probes in place. CBFV is especially sensitive to the mental demands of vigilance and sustained attention [61].

2.5.4. Functional Near-Infrared Imaging (fNIRS)

The Somantics Invos Cerebral/Somatic Oximeter, Model 5100C, was used to monitor (hemodynamic) changes in oxygenated hemoglobin (oxy-Hb) and deoxygenated hemoglobin (deoxy-HB) in the left and right hemisphere prefrontal cortex [62]. These data were processed to calculate regional oxygen saturation (rSO2), which indexes metabolic activity in frontal areas [62].

2.6. Procedure

The study was approved by the UCF Institutional Review Board. Following consent, participants donned the physiological sensors, including EEG, ECG, fNIRS, and TCD. A 5 min resting baseline was then collected from each participant for each sensor type. They took a pre-task survey set, which included a demographics questionnaire, the PAS, and the HIT trust scale. Next, participants were given a simulation demonstration and training with a PowerPoint presentation, followed by a hands-on practice session, supervised by the researcher. The hands-on practice consisted of a demonstration, followed by instruction and practice in multiple steps on the different task types, including checking, detection, and response implementation. Participants were asked to use the knowledge they learned from the demonstration and training sessions to complete the practice without the researcher’s guidance and assistance. The process was observed by the researcher to ensure that participants followed the instructions prompted to complete tasks in different task types and responded to the automation in an attentive manner. All participants completed the practice tasks successfully and exhibited proficiency in the simulator interface and controls. Participants then performed the two experimental scenarios, each with a different LOA. During the task, participants were asked to rate their workload level on a scale of 1 to 5 (ISA self-report scale). Psychophysiological measures were recorded throughout performance. After each scenario was complete, participants completed NASA-TLX, the MRQ, and the CTPA. These measures could thus be used to compare the two LOAs, but not the three task types. After this, the sensors were removed, and the participant was debriefed and then dismissed.

2.7. Statistical Methods

All data were analyzed with the SPSS Statistics package 29.0.0.0.0 (IBM, Armonk, NY, USA). The effects of the independent variables on workload, trust, and performance measures were tested using paired-samples t-tests and repeated-measures analysis of variance (ANOVA). The probability level (p-value) for significance testing was set to p < 0.05. The Bonferroni correction of multiple comparisons was applied to the p-value for t-tests. We reported effect sizes for findings relevant to the principal hypothesis that workload is lower at higher LOA, i.e., for comparisons of means for the two LOAs. Cohen’s d for within-subjects designs was used for this purpose. Conventionally, d values of 0.2, 0.5, and 0.8 are considered small, medium, and large effect sizes, although these values are somewhat arbitrary [63]. The averaged SD from the two LOA conditions was used as the standardizer for the calculation of d [64]. Effect sizes were also calculated for selected mean differences on workload measures between the current sample and that of a prior HPTF study representing conventional NPP operation [31]. Because of differences in sample size, Hedges’ g was used as the between-samples effect size index. Pearson correlations were computed to test relationships between trust measures, with a significance level of p < 0.05.

3. Results

3.1. Subjective Measures

3.1.1. NASA-TLX

Paired-samples t-tests with Bonferroni correction were conducted to determine whether the LOA influenced subjective workload measured by NASA-TLX. Participants reported higher temporal demand in the management-by-exception condition than in the management-by-consent condition. Cohen’s d for the difference in temporal demand between the two LOAs was 0.38, indicating a small-to-medium effect size. No significant difference was revealed from other subscales or the global workload score (Table 2). Thus, levels of workload were similar for the LOAs, with the exception of higher temporal demand at the higher LOA, which was an unexpected finding.

3.1.2. MRQ

Table 3 provides MRQ ratings for each LOA. In both conditions, participants reported moderate manual, short-term memory, and spatial demands. Visual lexical and temporal demands were also moderate. Auditory and vocal process demands were low. Paired-samples t-tests with Bonferroni corrections were conducted to test the effects of LOA on the mental demands measured by MRQ. With the Bonferroni correction applied, there were no significant differences between conditions. The two LOAs appeared to make qualitatively similar cognitive demands.

3.1.3. ISA

A 2 (LOA: management-by-consent and management-by-exception) × 3 (task type: checking, detection, and response implementation) repeated-measures ANOVA was run to test the effects of experimental manipulations on the subjective workload measured by ISA. A significant main effect of LOA was revealed. Overall, participants reported higher workload in the management-by-exception condition. Additionally, the interaction of LOA and task type was significant. In the management-by-consent condition, the detection task was rated as the task type with the highest workload. However, in the management-by-exception condition, the detection task received the lowest workload rating, and the checking task was rated as the highest workload (Table 4). Effect sizes (Cohen’s d) for the impact of LOA on ISA ratings were 0.45 for checking, 0.64 for response implementation, and 0.11 for detection. The higher LOA produced a medium effect size increase in ISA workload for two of the tasks but had minimal impact on detection. These findings were inconsistent with the expectation that workload would be lower at the higher LOA.

3.1.4. CPTA

Paired-samples t-tests were conducted to determine whether participants trusted the two LOAs differently. Participants reported slightly higher ratings for the management-by-exception condition (M = 56.33, SD = 13.32) than the management-by-consent condition (M = 54.91, SD = 12.74). However, the difference was not significant (t(43) = −0.90), implying similar levels of situational trust in both conditions.

3.2. Physiological Metrics

All physiological metrics were calculated as percentage differences from the five-minute resting baseline value. This method helped account for individual differences when comparing group means.

3.2.1. EEG

The EEG spectral frequency bands—theta, alpha, and beta—were analyzed separately. For each, we ran a 2 (LOA: management-by-consent and management-by-exception) × 3 (task type: checking, detection, and response implementation) × 2 (hemisphere: left and right) repeated-measures ANOVA. For the theta band, no significant main effect or interaction was revealed by the 2 × 3 × 2 ANOVA. For alpha power, a significant main effect was found for task type (F(1.68,52.19) = 4.96, p < 0.05, ɳp2 = 0.14). The detection task showed a smaller increase from baseline than the other two tasks (see Figure 3). Additionally, the interaction of the LOA and hemisphere was very close to significance (F(1,31) = 4.11, p = 0.05, ɳp2 = 0.12). In the right hemisphere, management-by-exception elicited a greater alpha increase for all three task types; however, such a trend was not observed in the left hemisphere. Effect sizes for the impact of the LOA on the right hemisphere alpha power were of small magnitude, ranging from 0.32 (detection) to 0.38 (response implementation). These findings are tentative because the p-value was just short of significance, but they suggest that cortical arousal and alertness in the right hemisphere were lower at the higher LOA.
For EEG beta power, a significant main effect was found for task type (F(1.59,49.36) = 11.54, p < 0.01, ɳp2 = 0.27; see Figure 4). The checking task showed a higher beta response than the detection and response implementation tasks, suggesting that checking induced a higher cognitive load, potentially due to the need for sustained focused attention. No other significant main interaction effects were found.

3.2.2. TCD

Effects of task factors on CBFV were analyzed using a 2 (LOA: management-by-consent and management-by-exception) × 3 (task type: checking, detection, and response implementation) × 2 (hemisphere: left and right) repeated-measures ANOVA. Results revealed no significant main effect for the LOA or task type, and no significant interaction.

3.2.3. fNIRS

Effects of task factors on regional oxygen saturation (rSO2) were analyzed using a 2 (LOA: management-by-consent and management-by-exception) × 3 (task type: checking, detection, and response implementation) × 2 (hemisphere: left and right) repeated-measures ANOVA. Results revealed no significant main effect for the LOA or task type, and no significant interaction.

3.2.4. ECG

Here, 2 (LOA: management-by-consent and management-by-exception) × 3 (task type: checking, detection, and response implementation) × 2 (hemisphere: left and right) repeated-measures ANOVAs were conducted to test effects of the LOA and task type manipulations on HR, HRV, and IBI. The ECG metrics, HR, IBI, and HRV, were derived from R-Peak detections using the So–Chan QRS algorithm from the raw ECG signal. The only significant finding regarding the ECG metrics was for HRV (Table 5). The main effect of the LOA was significant (F(1,32) = 4.25, p < 0.05, ɳp2 = 0.12). Participants had greater HRV changes from the pre-task baseline in the management-by-consent condition (M = 64.03, SE = 19.51) than in the management-by-exception condition (M = 35.44, SE = 11.08), suggesting higher workload in the management-by-exception condition. Effect sizes tended to be small in magnitude, ranging from 0.20 (response implementation) to 0.36 (checking). These data are inconsistent with the hypothesis that workload should decrease at the higher LOA. No other significant main effect or interaction was found for HRV or for HR and IBI.

3.3. Secondary Task Accuracy and RT

Within-groups t-tests on LOA for accuracy (% detections) and mean RT (ms) revealed that mean RT was significantly higher in management-by-consent compared with management-by-exception (Table 6). Some data were missing for the RT analysis due to an insufficient accurate response frequency. The effect size was medium (d = 0.61), indicating substantial slowing of the response at the lower LOA. There was a trend toward lower accuracy at the former LOA, but it was non-significant. These data support the hypothesis that higher LOAs reduce workload. A lower primary task workload allows participants to allocate more attention to responding quickly to secondary task stimuli.

3.4. Individual Differences in Trust

The HIT score was significantly positively associated with the situational trust in automation measured by CTPA in both LOA conditions. Neither of the PAS subscales predicted trust in automation (Table 7).

3.5. Cross-Study Comparison

To further understand the impact of automation on workload responses, an additional analysis was performed to compare the current data to workload data from Hughes et al. (2023; Exp 2: N = 69) [31]. The latter study used a similar experimental scenario within a simulation of conventional operations unsupported by automation, performed by a three-person crew. Table 8 summarizes characteristics of the two experiments.
A series of independent-samples t-tests was run to test for differences in subjective workload between the two studies. They were run as planned comparisons on the basis that lower workload was anticipated in the automated scenario. Data from the management-by-consent condition of the present study were utilized, because the previous analyses suggested that this condition was more effective than management-by-exception in reducing subjective operator workload. Whenever the assumption of homogeneity of variances, assessed by Levene’s test for equality of variances, was violated, a Welch t-test was adopted. This was necessary for MRQ scales. Table 9 summarizes the results. In the current study, workload was lower on multiple metrics, including the majority of the NASA-TLX subscales, the ISA, and several MRQ ratings. Hedges’ g for the inter-sample difference in global NASA-TLX workload was 0.54, indicating a medium effect size. Effect sizes for NASA-TLX ratings were highest for frustration (g = 0.60) and temporal demand (g = 0.63). Effect sizes for the three ISA scales were large for checking (g = 1.61) and response implementation (g = 1.70) and medium for detection (g = 0.49). On the MRQ, the current study elicited lower levels of auditory processing, short-term memory, visual phonetic processing, and vocal processing demand. The larger effect sizes for the MRQ likely reflect the requirement for teaming and verbal communication in the Hughes et al. (2023) [31] study, such as the greatly elevated auditory linguistic demand in that study (g = 2.84). It is of interest that there was also a medium–large difference in short-term memory demand (g = 0.65) between the two studies. However, other sources of demand, such as the various spatial demands, were not significantly different between studies.

4. Discussion

The present study aimed to investigate workload and trust during execution of CBPs supported by automation, one of the elements of NPP digitalization. The task type and the LOA of the automation were manipulated experimentally. We anticipated that the higher of the two LOAs, management-by-exception, would produce lower workload. This hypothesis was confirmed for the secondary task measure of workload, on which mean RT was faster for management-by-exception. However, management-by-consent tended to produce lower workload according to subjective and HRV metrics, contrary to expectations. Several of these inter-LOA differences in workload were of medium magnitude according to the effect size calculation, i.e., large enough to have a substantial impact if reproduced in the operational setting. LOA also had no significant effect on trust, although we identified individual differences in trust associated with prior disposition to trust automation. An additional analysis compared workload data from the current study with those from one of our previous HPTF studies [31] based on a simulation of a conventional NPP control room. In the remainder of this discussion, we consider further impacts of task factors on workload during CBP execution, the need to further investigate trust in digitalized plants, and differences in operator response to conventional and automation-supported plants.

4.1. Task Factors and Workload Response During CBP Execution

Previous research on replacement of PBPs with CBPs suggested that CBPs benefit performance and, in general, mitigate workload [8,65]. However, concerns have also been raised about potential tradeoffs between different elements of performance, the demands of automation management, and possible loss of situation awareness and out-of-the-loop issues at higher LOAs [8,19,35,40]. Thus, if automation is to be introduced, it is important to determine the optimal LOA.
Results of this study did not consistently support our hypothesis that the higher LOA—management-by-exception—would reduce workload. Different metrics suggested different LOA impacts. The faster secondary task response at the higher LOA did suggest it frees up attentional resources for monitoring the additional gauges. However, compared to management-by-consent, management-by-exception actually produced higher workload on the ISA and the temporal demands scale of the NASA-TLX. The psychophysiological data were less clearcut but the HRV index from the ECG also indicated higher workload with management-by-exception. There was also a near-significant interactive effect of the LOA and hemisphere. Alpha power was higher for management-by-exception than for management-by-consent in the right hemisphere only. Alpha suppression in management-by-consent suggests higher workload in the lower LOA condition, seemingly in contradiction to findings with the other indices. However, the effect on alpha was relatively weak, and in these data, HRV appeared to provide a more sensitive workload index. The hemodynamic fNIRS and TCD measures were insensitive to LOA.
One explanation for these findings is that management-by-exception introduces a passive, reactive performance strategy, consistent with previous LOA research [19]. In this case, the focus of attention may be broader, enhancing the secondary task response speed, at the cost of reduced engagement with the primary, CBP operational task. Management-by-exception also introduces the need to monitor the time window allowed for over-riding the automation, which may have contributed to subjective temporal demands and subjective workload, as well influencing HRV. The right hemisphere alpha response may suggest there is a vigilance element to monitoring the time window given that vigilance is localized primarily in the right hemisphere [61,66] and alpha power tends to be elevated during vigilance [67]. This analysis implies that there may be hidden costs to the higher LOA, such as reduced situation awareness during unexpected events [19]. We did not test this possibility in the current study, but it could be investigated further.
We also tested for effects of task type on workload metrics and their interaction with the LOA. In previous studies [29,31], the detection task appeared to be the most demanding task according to most (though not all) metrics. This effect has been attributed to the demands of maintaining vigilance during detection [23]. However, in this study, the workload of detection appeared to be lower or equal to that of the other two tasks, contrary to previous findings. On the ISA, the task type × LOA interaction was significant. There was a trend toward detection imposing the highest workload for management-by-consent, but workload was lowest for detection and highest for checking for management-by-exception. For the physiological measures, significant effects of task type were found for alpha and beta. In both frequency bands, power was highest for checking and lowest for detection. Automation, especially at the higher LOA, may be helpful in reducing the workload burden of detection tasks, consistent with the idea that it encourages a passive monitoring strategy. As operators tend to be more prone to errors during extended detection tasks in conventional NPPs, the computerized procedure system and automation functionality may mitigate such vulnerability by alleviating workload.

4.2. The Role of Trust

Operators’ trust in automation and its performance and safety impacts is a major and well-researched issue in human factors [5,16]. The importance of operator trust calibration in digitalized NPPs is recognized [4,7], but empirical research on trust impacts during operations is lacking. The present findings showed that mean levels of trust were similar at both LOAs. Scores on the CPTA state measure [48] can range from 12 to 84. Here, mean CPTA scores were around 55, a little above the midpoint of the scale, which seems low given that the automation was generally reliable. The relative lack of trust may reflect the novice participants’ unfamiliarity with the system. We also observed systematic individual differences in trust in the automation. The current data pointed to the risk that some operators are over-trusting, whereas others are under-trusting, leading to the performance failings associated with over- and under-reliance, respectively [5,16]. The HIT trait measure [46] appeared to be more effective in predicting state trust than the PAS [47]. The current study did not investigate trust and its relationship with performance in depth, but future research should further investigate variations in trust and its performance consequences.

4.3. Comparison with Conventional NPP Operation

The configuration of the operator interface was substantially changed in this experiment compared with previous HPTF studies [29] in order to investigate workload during an EOP controlled by a CBP, rather than the senior reactor operator (SRO) of a three-person crew. Workload data from the higher LOA condition of the current study were compared with data from HPTF Experiment 2 [31], in which three-person novice crews executed a simulated EOP using a touchscreen interface. Across the two studies, mean levels of workload tended to be lower in the current study, consistent with the expectation that digitalization of interfaces can be used to mitigate workload. However, inter-study differences were more pronounced on the ISA than on the NASA-TLX, suggesting that workload benefits of the CPB were modest.
The cross-study comparison of MRQ data provided insights into the qualitative differences between conventional and CBP plant operations. The largest-magnitude differences were found for scales for auditory and vocal processes. Lower demands in the current study likely reflected the shift from three- to one-person operation, such that participants worked with the CBP individually and did not need to rely on three-way communication to complete the procedure.
The lower short-term memory demands in the current study suggested workload reduction as more directly related to assistance from the CBP. Reduced demands on short-term memory are consistent with the idea that CBPs facilitate the correct sequencing of steps in a procedure [20], evidence for the use of CBPs in mitigating workload [40], and the performance benefits of automating aspects of CBPs [12]. However, MRQ data also showed that various spatial demands were quite high during operations, and these demands were only modestly reduced in the current study. Only one out of the six spatial demands (spatial-categorical) was significantly lower than those in the comparison study. Thus, while CPBs supported by automation may be generally effective for workload mitigation, their benefits may be somewhat selective.

4.4. Limitations

One limitation was the use of a novice sample. It is challenging to establish that results obtained with novice populations will generalize to trained operators in the real setting. In the case of workload response, previous HPTF research established that novice and trained operator responses to task factors were qualitatively similar, supporting the validity of the methodology [28,51]. We can be cautiously optimistic that workload results from the current study will similarly generalize, although direct comparison of findings with those from an operator sample would be desirable. It is uncertain whether findings on trust in the automation will generalize to trained operators, given that trust is influenced by experience and expertise [68].
Expertise and training issues require further research. Plant modernization will require an increased focus on advanced training tools [69], given that both current operators and those entering the workforce will need to acquire new skills to work with digitalized systems. These include general competencies in working with software (e.g., navigating menu structures) and in upgrading skills as new technologies are introduced. Novel plant designs also require acquisition of adequate mental models of the automated control loops that support operation, the capacity to manage unexpected events, trust calibration and trust repair skills, and effective communication with remotely located team members. The research literature on training in process control industries provides various strategies relevant to NPP operators, such as using high-fidelity simulations and virtual reality (VR) to promote experiential learning, attention shift training, and situation awareness training [70,71]. Findings from novice samples may identify issues to be addressed in training, such as the apparent under-trust evident in the current sample.
Another limitation is that the study obtained data only from a single EOP. CPBs can be utilized in other contexts, such as supporting operator situation awareness during routine operations [10]. It remains to be determined whether the results would generalize to other operational scenarios. For example, somewhat different impacts of CPB type for startup and loss of feedwater scenarios have been reported [12]. It would also be desirable to test for generalization of findings to CPBs utilized within novel reactor designs, such as SMRs.
A third limitation is that the study focused on workload and trust responses. We obtained RT data, but we did not attempt to investigate errors in use of the automation, given that reliability was high and the CBP committed few errors during the procedure. However, it is important to determine relationships between workload and trust responses to the automation and errors, including failures to detect rare and unprecedented automation failures [72]. Evidence on this issue is needed to support HRA for automated systems and to investigate factors influencing operator trust calibration and optimization.

4.5. Conclusions

The principal objective of this study was to evaluate the impact of two different LOAs on participant workload during execution of an EOP supported by a CBP, using multiple metrics for workload. The study utilized a simulation of modernized NPP operation that built on prior studies at the NRC HPTF. Study findings broadly endorsed the utility of CBPs supported by automation in support of plant modernization, consistent with previous studies [8,12,65]. Novice participants given limited training were found to be capable of executing a CBP supported by automation quite successfully. Workload levels were generally quite low, implying that participants were not taxed excessively by task demands. We hypothesized that the higher of the two LOAs would mitigate task workload. The hypothesis was confirmed only for a performance-based workload index, speed of response. The medium effect size (Cohen’s d = 0.61) for the LOA effect suggested that the effect size was quite substantial. By contrast, the LOA effects on aspects of subjective workload and HRV showed that workload was higher at the higher LOA, unexpectedly. Again, d values in the 0.3–0.6 range demonstrated substantial effects. Thus, workload impacts of LOA differed across workload metrics. We could not rigorously compare workload levels with those of our previous HPTF studies that simulated conventional operations, but use of the CPB appeared to reduce some aspects of cognitive demand, including short-term memory. The lower of the LOAs utilized, management-by-consent, appeared to the more effective for workload mitigation on certain metrics. Higher LOAs may lead to out-of-the-loop issues for operators [19], even if they benefit some aspects of performance. Thus, there may be advantages to implementing management-by-consent over higher levels. However, more work in the operational context is needed to determine how the LOA influences workload, situation awareness, and performance for the various automated systems that support plant modernization. Future work should also investigate further the role of operator trust in the system and methods for trust optimization.

Author Contributions

All authors listed have made a substantial, direct and intellectual contribution to the work. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the U.S. Nuclear Regulatory Commission.

Institutional Review Board Statement

The study was conducted according to the guidelines of the Declaration of Helsinki and approved by the Institutional Review Board of the University of Central Florida on 13 September 2021 (#00003369).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The datasets presented in this article are not readily available because access requires permission from the U.S. Nuclear Regulatory Commission. Requests to access the datasets should be directed initially to Dr. Gerald Matthews.

Acknowledgments

This report was prepared as an account of work sponsored in part by an agency of the U.S. Government. Neither the U.S. Government nor any agency thereof, nor any of their employees, makes any warranty, expressed or implied, or assumes any legal liability or responsibility for any third party’s use, or the results of such use, of any information, apparatus, product, or process disclosed in this report, or represents that its use by such third party would not infringe privately owned rights. The views expressed in this paper are not necessarily those of the U.S. Nuclear Regulatory Commission.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Items of the CPTA

  • The system is deceptive
  • The system behaves in an underhanded manner
  • I am suspicious of the system’s intent, action, or outputs
  • I am wary of the system
  • The system’s actions will have a harmful or injurious outcome
  • I am confident in the system
  • The system provides security
  • The system has integrity
  • The system is dependable
  • The system is reliable
  • I can trust the system
  • I am familiar with the system

References

  1. Hall, A.; Joe, J.C. Utility and industry perceptions of control room modernization over the last 10 years. Nucl. Technol. 2024, 210, 2290–2298. [Google Scholar] [CrossRef]
  2. Joe, J.C.; Kovesdi, C.R. Developing a Strategy for Full Nuclear Plant Modernization; (No. INL/EXT-18-51366-Rev000); Idaho National Lab. (INL): Idaho Falls, ID, USA, 2018. [Google Scholar]
  3. Boring, R.L.; Ulrich, T.A.; Lew, R. Levels of digitization, digitalization, and automation for advanced reactors. Proc. Hum. Factors Ergon. Soc. Annu. Meet. 2023, 67, 207–213. [Google Scholar] [CrossRef]
  4. Kim, Y.; Park, J. Envisioning human-automation interactions for responding emergency situations of NPPs: A viewpoint from human computer interaction. In Proceedings of the Transactions of the Korean Nuclear Society Autumn Meeting, Yeosu, Republic of Korea, 25–26 October 2018. [Google Scholar]
  5. Lee, J.D.; See, K.A. Trust in automation: Designing for appropriate reliance. Hum. Factors 2004, 46, 50–80. [Google Scholar] [CrossRef]
  6. Parasuraman, R.; Manzey, D.H. Complacency and bias in human use of automation: An attentional integration. Hum. Factors 2010, 52, 381–410. [Google Scholar] [CrossRef]
  7. Porthin, M.; Liinasuo, M.; Kling, T. Effects of digitalization of nuclear power plant control rooms on human reliability analysis—A review. Reliab. Eng. Syst. Saf. 2020, 194, 106415. [Google Scholar] [CrossRef]
  8. O’Hara, J.M.; Higgins, J.C.; Stubler, W.F.; Kramer, J. Computer-Based Procedure Systems: Technical Basis and Human Factors Review Guidance; Division of Systems Analysis and Regulatory Effectiveness, Office of Nuclear Regulatory Research, United States Nuclear Regulatory Commission: Washington, DC, USA, 2000. [Google Scholar]
  9. Gao, Q.; Yu, W.; Jiang, X.; Song, F.; Pan, J.; Li, Z. An integrated computer-based procedure for teamwork in digital nuclear power plants. Ergonomics 2015, 58, 1303–1313. [Google Scholar] [CrossRef]
  10. Le Blanc, K.L.; Oxstrand, J.H. Computer–Based Procedures for nuclear power plant field workers: Preliminary results from two evaluation studies. Proc. Hum. Factors Ergon. Soc. Annu. Meet. 2013, 57, 1722–1726. [Google Scholar] [CrossRef]
  11. Oxstrand, J.; Le Blanc, K.L.; Bly, A. The Next Step in Deployment of Computer Based Procedures for Field Workers: Insights and Results from Field Evaluations at Nuclear Power Plants; (No. INL/CON-14-32990); Idaho National Lab (INL): Idaho Falls, ID, USA, 2015. [Google Scholar]
  12. Hall, A.; Boring, R.L.; Ulrich, T.A.; Lew, R.; Velazquez, M.; Xing, J.; Whiting, T.; Makrakis, G.M. A comparison of three types of computer-based procedures: An experiment using the Rancor Microworld Simulator. Proc. Hum. Factors Ergon. Soc. Annu. Meet. 2023, 67, 2552–2557. [Google Scholar] [CrossRef]
  13. Balfe, N.; Sharples, S.; Wilson, J.R. Impact of automation: Measurement of performance, workload and behaviour in a complex control environment. Appl. Ergon. 2015, 47, 52–64. [Google Scholar] [CrossRef]
  14. Endsley, M.R. Automation and situation awareness. In Automation and Human Performance: Theory and Applications; Parasuraman, R., Mouloua, M., Eds.; Lawrence Erlbaum Associates: Hillsdale, NJ, USA, 1996; pp. 163–181. [Google Scholar]
  15. Lin, C.J.; Yenn, T.C.; Jou, Y.T.; Hsieh, T.L.; Yang, C.W. Analyzing the staffing and workload in the main control room of the advanced nuclear power plant from the human information processing perspective. Saf. Sci. 2013, 57, 161–168. [Google Scholar] [CrossRef]
  16. Parasuraman, R.; Riley, V. Humans and automation: Use, misuse, disuse, abuse. Hum. Factors 1997, 39, 230–253. [Google Scholar] [CrossRef]
  17. Kaber, D.B. Issues in human–automation interaction modeling: Presumptive aspects of frameworks of types and levels of automation. J. Cogn. Eng. Decis. Mak. 2018, 12, 7–24. [Google Scholar] [CrossRef]
  18. Parasuraman, R.; Sheridan, T.B.; Wickens, C.D. A model for types and levels of human interaction with automation. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 2000, 30, 286–297. [Google Scholar] [CrossRef] [PubMed]
  19. Kaber, D.B.; Onal, E.; Endsley, M.R. Design of automation for telerobots and the effect on performance, operator situation awareness, and subjective workload. Hum. Factors Ergon. Manuf. Serv. Ind. 2000, 10, 409–430. [Google Scholar] [CrossRef]
  20. O’Hara, J.M.; Fleger, S.A. Human-System Interface Design Review Guidelines; (NUREG-0700, Rev.3); United States Nuclear Regulatory Commission: Washington, DC, USA, 2020. [Google Scholar]
  21. Joe, J.C.; Boring, R.L.; Persensky, J.J. Commercial utility perspectives on nuclear power plant control room modernization. In Proceedings of the 8th International Topical Meeting on Nuclear Power Plant Instrumentation, Control, and Human-Machine Interface Technologies (NPIC&HMIT), San Diego, CA, USA, 22–26 July 2012; pp. 2039–2046. [Google Scholar]
  22. Anokhin, A.; Lourie, V.; Dzhumaev, S.; Golovanev, V.; Kompanietz, N. Upgrade of the Kursk NPP main control room (case study). Int. Control Room Des. Conf. Proc. 2010, 2010, 207–214. [Google Scholar]
  23. Reinerman-Jones, L.; Matthews, G.; Mercado, J.E. Detection tasks in nuclear power plant operation: Vigilance decrement and physiological workload monitoring. Saf. Sci. 2016, 88, 97–107. [Google Scholar] [CrossRef]
  24. Wu, X.; Li, Z. Secondary task method for workload measurement in alarm monitoring and identification tasks. In Cross-Cultural Design. Methods, Practice, and Case Studies: 5th International Conference, CCD 2013; Springer Berlin Heidelberg: Berlin, Germany, 2013; pp. 346–354. [Google Scholar]
  25. Matthews, G.; Reinerman-Jones, L. Workload Assessment: How to Diagnose Workload Issues and Enhance Performance; Human Factors and Ergonomics Society: Santa Monica, CA, USA, 2017. [Google Scholar]
  26. O’Hara, J.M.; Higgins, J.C.; Fleger, S.A.; Pieringer, P.A. Human Factors Engineering Program Review Model; (NUREG-0711, Rev.3); United States Nuclear Regulatory Commission: Washington, DC, USA, 2012. [Google Scholar]
  27. Swain, A.D.; Guttmann, H.E. Handbook of Human-Reliability Analysis with Emphasis on Nuclear Power Plant Applications; Final report (No. NUREG/CR--1278); Sandia National Labs: Albuquerque, NM, USA, 1983. [Google Scholar]
  28. Lin, J.; Matthews, G.; Hughes, N.; Dickerson, K. Novices as models of expert operators: Evidence from the NRC Human Performance Test Facility. Hum. Factors Simul. 2022, 30, 1–10. [Google Scholar]
  29. Hughes, N.; D’Agostino, A.; Dickerson, K.; Matthews, M.; Reinerman-Jones, L.; Barber, D.; Mercado, J.; Harris, J.; Lin, J. Human Performance Test Facility (HPTF). Volume 1—Systematic Human Performance Data Collection Using Nuclear Power Plant Simulator: A Methodology; Research Information Letter RIL 2022-11; United States Nuclear Regulatory Commission: Washington, DC, USA, 2023. [Google Scholar]
  30. Hughes, N.; Lin, J.; Matthews, G.; Barber, D.; Dickerson, K. Human Performance Test Facility (HPTF). Volume 3—Supplemental Exploratory Analyses of Sensitivity of Workload Measures; Research Information Letter RIL 2022-11; United States Nuclear Regulatory Commission: Washington, DC, USA, 2023. [Google Scholar]
  31. Hughes, N.; Reinerman-Jones, L.; Lin, J.; Matthews, G.; Barber, D.; Dickerson, K. Human Performance Test Facility (HPTF). Volume 2—Comparing Operator Workload and Performance Between Digitized and Analog Simulated Environments; Research Information Letter RIL 2022-11; United States Nuclear Regulatory Commission: Washington, DC, USA, 2023. [Google Scholar]
  32. Warm, J.S.; Parasuraman, R.; Matthews, G. Vigilance requires hard mental work and is stressful. Hum. Factors 2008, 50, 433–441. [Google Scholar] [CrossRef]
  33. Tran, T.Q.; Boring, R.L.; Joe, J.C.; Griffith, C.D. Extracting and converting quantitative data into human error probabilities. In Proceedings of the 2007 IEEE 8th Human Factors and Power Plants and HPRCT 13th Annual Meeting, Monterey, CA, USA, 26–31 August 2007; pp. 164–169. [Google Scholar]
  34. Hwang, S.L.; Yau, Y.J.; Lin, Y.T.; Chen, J.H.; Huang, T.H.; Yenn, T.C.; Hsu, C.C. Predicting work performance in nuclear power plants. Saf. Sci. 2008, 46, 1115–1124. [Google Scholar] [CrossRef]
  35. Lin, C.J.; Yenn, T.C.; Yang, C.W. Automation design in advanced control rooms of the modernized nuclear power plants. Saf. Sci. 2010, 48, 63–71. [Google Scholar] [CrossRef]
  36. Skjerve, A.B.M.; Skraaning, G., Jr. The quality of human-automation cooperation in human-system interface for nuclear power plants. Int. J. Hum. Comput. Stud. 2004, 61, 649–677. [Google Scholar] [CrossRef]
  37. Huang, F.H.; Hwang, S.L.; Yenn, T.C.; Yu, Y.C.; Hsu, C.C.; Huang, H.W. Evaluation and comparison of alarm reset modes in advanced control room of nuclear power plants. Saf. Sci. 2006, 44, 935–946. [Google Scholar] [CrossRef]
  38. Huang, F.H.; Lee, Y.L.; Hwang, S.L.; Yenn, T.C.; Yu, Y.C.; Hsu, C.C.; Huang, H.W. Experimental evaluation of human–system interaction on alarm design. Nucl. Eng. Des. 2007, 237, 308–315. [Google Scholar] [CrossRef]
  39. Jou, Y.T.; Yenn, T.C.; Lin, C.J.; Yang, C.W.; Chiang, C.C. Evaluation of operators’ mental workload of human–system interface automation in the advanced nuclear power plants. Nucl. Eng. Des. 2009, 239, 2537–2542. [Google Scholar] [CrossRef]
  40. Qing, T.; Liu, Z.; Tang, Y.; Hu, H.; Zhang, L.; Chen, S. Effects of automation for emergency operating procedures on human performance in a nuclear power plant. Health Phys. 2021, 121, 261. [Google Scholar] [CrossRef]
  41. Janssen, C.P.; Donker, S.F.; Brumby, D.P.; Kun, A.L. History and future of human-automation interaction. Int. J. Hum. Comput. Stud. 2019, 131, 99–107. [Google Scholar] [CrossRef]
  42. Bye, A. Future needs of human reliability analysis: The interaction between new technology, crew roles and performance. Saf. Sci. 2023, 158, 105962. [Google Scholar] [CrossRef]
  43. Tasset, D.; Charron, S.; Miberg, A.B.; Hollnagel, E. The Impact of Automation on Operator Performance. An Explorative Study; (No. HRP--352/V. 1); Institutt for Energiteknikk, OECD Halden Reactor Project: Halden, Norway, 1999. [Google Scholar]
  44. Kohn, S.C.; de Visser, E.J.; Wiese, E.; Lee, Y.C.; Shaw, T.H. Measurement of trust in automation: A narrative review and reference guide. Front. Psychol. 2021, 12, 604977. [Google Scholar] [CrossRef]
  45. Matthews, G.; Panganiban, A.R.; Lin, J.; Long, M.; Schwing, M. Super-machines or sub-humans: What are the unique features of trust in Intelligent Autonomous Systems? In Trust in Human-Robot Interaction: Research and Applications; Nam, C.S., Lyons, J., Eds.; Elsevier: Amsterdam, The Netherlands, 2021; pp. 59–82. [Google Scholar]
  46. Lyons, J.B.; Guznov, S.Y. Individual differences in human–machine trust: A multi-study look at the perfect automation schema. Theor. Issues Ergon. Sci. 2019, 20, 440–458. [Google Scholar] [CrossRef]
  47. Merritt, S.M.; Unnerstall, J.L.; Lee, D.; Huber, K. Measuring individual differences in the perfect automation schema. Hum. Factors 2015, 57, 740–753. [Google Scholar] [CrossRef]
  48. Jian, J.Y.; Bisantz, A.M.; Drury, C.G. Foundations for an empirically determined scale of trust in automated systems. Int. J. Cogn. Ergon. 2000, 4, 53–71. [Google Scholar] [CrossRef]
  49. Sauer, J.; Chavaillaz, A.; Wastell, D. Experience of automation failures in training: Effects on trust, automation bias, complacency and performance. Ergonomics 2016, 59, 767–780. [Google Scholar] [CrossRef] [PubMed]
  50. GSE Systems, Inc. GSE’s Generic Nuclear Plant Simulator; GSE Systems, Inc.: Columbia, MD, USA, 2016; Available online: https://www.gses.com/wp-content/uploads/GSE-GPWR-brochure.pdf (accessed on 24 December 2024).
  51. Hughes, N.; D’Agostino, A.; Reinerman, L. The NRC’s Human Performance Test Facility: Methodological considerations for developing a research program for systematic data collection using an NPP simulator. In Proceedings of the Enlarged Halden Programme Group (EHPG) Meeting, Lillehammer, Norway, 18–24 September 2017. [Google Scholar]
  52. Reinerman-Jones, L.E.; Guznov, S.; Mercado, J.; D’Agostino, A. Developing methodology for experimentation using a nuclear power plant simulator. In Foundations of Augmented Cognition; Schmorrow, D.D., Fidopiastis, C.M., Eds.; Springer: Heidelberg, Germany, 2013; pp. 181–188. [Google Scholar]
  53. Hart, S.G.; Staveland, L.E. Development of NASA-TLX (Task Load Index): Results of empirical and theoretical research. In Advances in Psychology; Hancock, P.A., Meshkati, N., Eds.; North-Holland: Amsterdam, The Netherlands, 1988; Volume 52, pp. 139–183. [Google Scholar]
  54. Boles, D.B.; Adair, L.P. The Multiple Resources Questionnaire (MRQ). Proc. Hum. Factors Ergon. Soc. Annu. Meet. 2001, 45, 1790–1794. [Google Scholar] [CrossRef]
  55. Tattersall, A.J.; Foord, P.S. An experimental evaluation of instantaneous self-assessment as a measure of workload. Ergonomics 1996, 39, 740–748. [Google Scholar] [CrossRef] [PubMed]
  56. Jasper, H.H. The 10/20 international electrode system. Electroencephalogr. Clin. Neurophysiol. 1958, 10, 370–375. [Google Scholar]
  57. Borghini, G.; Astolfi, L.; Vecchiato, G.; Mattia, D.; Babiloni, F. Measuring neurophysiological signals in aircraft pilots and car drivers for the assessment of mental workload, fatigue and drowsiness. Neurosci. Biobehav. Rev. 2014, 44, 58–75. [Google Scholar] [CrossRef]
  58. So, H.H.; Chan, K.L. Development of QRS detection method for real-time ambulatory cardiac monitor. In Proceedings of the 19th Annual International Conference of the IEEE Engineering in Medicine and Biology Society. ‘Magnificent Milestones and Emerging Opportunities in Medical Engineering’, Chicago, IL, USA, 30 October–2 November 1997; Volume 1, pp. 289–292. [Google Scholar]
  59. Taylor, G.; Reinerman-Jones, L.E.; Cosenzo, K.; Nicholson, D. Comparison of multiple physiological sensors to classify operator state in adaptive automation systems. Proc. Hum. Factors Ergon. Soc. Annu. Meet. 2010, 54, 195–199. [Google Scholar] [CrossRef]
  60. Fairclough, S.H.; Mulder, L.J.M. Psychophysiological processes of mental effort investment. In How Motivation Affects Cardiovascular Response: Mechanisms and Applications; Wright, R.A., Gendolla, G.H.E., Eds.; American Psychological Association: Washington DC, USA, 2011; pp. 61–76. [Google Scholar]
  61. Warm, J.S.; Tripp, L.D.; Matthews, G.; Helton, W.S. Cerebral hemodynamic indices of operator fatigue in vigilance. In Handbook of Operator Fatigue; Matthews, G., Desmond, P.A., Neubauer, C., Hancock, P.A., Eds.; Ashgate Press: Aldershot, UK, 2012; pp. 197–207. [Google Scholar]
  62. Ayaz, H.; Shewokis, P.A.; Curtin, A.; Izzetoglu, M.; Izzetoglu, K.; Onaral, B. Using MazeSuite and functional near infrared spectroscopy to study learning in spatial navigation. J. Vis. Exp. 2011, 56, 3443. [Google Scholar] [CrossRef]
  63. Cohen, J. Statistical Power Analysis for the Behavioral Sciences, 2nd ed.; Erlbaum: Hillsdale, NJ, USA, 1988. [Google Scholar]
  64. Lakens, D. Calculating and reporting effect sizes to facilitate cumulative science: A practical primer for t-tests and ANOVAs. Front. Psychol. 2013, 4, 863. [Google Scholar] [CrossRef]
  65. Kim, Y.; Jung, W.; Kim, S. Empirical investigation of workloads of operators in advanced control rooms. J. Nucl. Sci. Technol. 2014, 51, 744–751. [Google Scholar] [CrossRef]
  66. Langner, R.; Eickhoff, S.B. Sustaining attention to simple tasks: A meta-analytic review of the neural mechanisms of vigilant attention. Psychol. Bull. 2013, 139, 870. [Google Scholar] [CrossRef] [PubMed]
  67. Kamzanova, A.T.; Kustubayeva, A.M.; Matthews, G. Use of EEG workload indices for diagnostic monitoring of vigilance decrement. Hum. Factors 2014, 56, 1136–1149. [Google Scholar] [CrossRef] [PubMed]
  68. Hoff, K.A.; Bashir, M. Trust in automation: Integrating empirical evidence on factors that influence trust. Hum. Factors 2015, 57, 407–434. [Google Scholar] [CrossRef]
  69. Joe, J.C.; Remer, S.J. Developing a Roadmap for Total Nuclear Plant Transformation; (No. INL/EXT-19-54766-Rev000); Idaho National Laboratory: Idaho Falls, ID, USA, 2019. [Google Scholar]
  70. Burkolter, D.; Kluge, A.; Sauer, J.; Ritzmann, S. Comparative study of three training methods for enhancing process control performance: Emphasis shift training, situation awareness training, and drill and practice. Comput. Hum. Behav. 2010, 26, 976–986. [Google Scholar] [CrossRef]
  71. Kluge, A.; Nazir, S.; Manca, D. Advanced applications in process control and training needs of field and control room operators. IIE Trans. Occup. Ergon. Hum. Factors 2014, 2, 121–136. [Google Scholar] [CrossRef]
  72. Hancock, P.A. Reacting and responding to rare, uncertain and unprecedented events. Ergonomics 2023, 66, 454–478. [Google Scholar] [CrossRef]
Figure 1. HPTF NPP simulator. Physiological recording equipment is visible in the foreground.
Figure 1. HPTF NPP simulator. Physiological recording equipment is visible in the foreground.
Safety 11 00022 g001
Figure 2. Computerized procedure tool, illustrating automation of a checking step.
Figure 2. Computerized procedure tool, illustrating automation of a checking step.
Safety 11 00022 g002
Figure 3. EEG alpha percentage change from baseline by task type and hemisphere (error bars denote standard errors).
Figure 3. EEG alpha percentage change from baseline by task type and hemisphere (error bars denote standard errors).
Safety 11 00022 g003
Figure 4. EEG beta percentage change from baseline by task type and hemisphere (error bars denote standard errors).
Figure 4. EEG beta percentage change from baseline by task type and hemisphere (error bars denote standard errors).
Safety 11 00022 g004
Table 1. Levels of automation for NPP applications [20].
Table 1. Levels of automation for NPP applications [20].
LevelAutomation TasksHuman Tasks
(1) Manual OperationNo automationOperators manually perform all tasks
(2) Shared OperationAutomatic performance of some tasksOperators perform some tasks manually
(3) Operation by ConsentAutomatic performance when directed by operators to do so, under close monitoring and supervisionOperators monitor closely, approve actions, and may intervene to provide supervisory commands that automation follows
(4) Operation by ExceptionEssentially autonomous operation unless specific situations or circumstances are encounteredOperators must approve of critical decisions and may intervene
(5) Autonomous OperationFully autonomous operation. System cannot normally be disabled but may be started manuallyOperators monitor performance and perform back up if necessary, feasible, and permitted
Table 2. Effects of LOA on NASA-TLX ratings.
Table 2. Effects of LOA on NASA-TLX ratings.
Management-by-ConsentManagement-by-Exception
MSDMSDtdf
Global Workload22.2114.9525.4316.00−1.6634
Mental Demand37.0028.8838.6332.32−0.3934
Physical Demand17.2914.5221.8921.01−1.3334
Temporal Demand19.1421.1328.0625.24−3.04 *34
Effort20.7118.0325.7724.56−1.7534
Frustration18.4927.2317.9720.390.1334
Performance20.6628.9720.2627.160.0734
* p < 0.05.
Table 3. Effects of LOA on MRQ ratings.
Table 3. Effects of LOA on MRQ ratings.
Management-by-ConsentManagement-by-Exception
MSDMSDtdf
Auditory Emotional9.3022.416.9116.890.7143
Auditory Linguistic10.1621.8513.7036.19−0.8243
Manual Process54.9836.1955.0237.29−0.0143
Short-Term Memory51.4835.6657.2336.44−1.7543
Spatial Attentive65.4538.6861.8036.951.0043
Spatial Concentrative50.8035.6750.0536.360.2043
Spatial Categorical44.8035.5340.9136.500.9643
Spatial Emergent57.1437.6356.8039.550.0843
Spatial Positional56.2536.8260.3437.25−0.9043
Spatial Quantitative49.5038.1248.3939.370.2543
Visual Lexical59.5537.7858.9337.920.1143
Visual Phonetic27.8234.0637.2539.01−2.2943
Visual Temporal53.5539.6653.4538.310.0243
Vocal Process15.1827.8612.6124.341.1343
Table 4. Two-way ANOVA statistics for ISA ratings.
Table 4. Two-way ANOVA statistics for ISA ratings.
Management-by-ConsentManagement-by-ExceptionANOVA
MSDnMSDnEffectFdfɳp2
Task type
Checking1.410.64371.760.9037LOA5.81 *1.360.14
Response1.240.44371.620.7237Task type1.592.720.04
Detection1.590.69371.510.8037L × T1.23 **2.720.18
* p < 0.05 and ** p < 0.01. L × T: LOA × task type.
Table 5. Two-way ANOVA statistics for HRV.
Table 5. Two-way ANOVA statistics for HRV.
Management-by-ConsentManagement-by-ExceptionANOVA
MSDMSDEffectFdfɳp2
Checking76.64143.0936.2371.31LOA4.28 *1.320.12
Response Implementation49.75116.1029.9377.49Task type1.162.640.04
Detection65.70129.1040.1762.94LOA × TT0.522.640.02
* p < 0.05. LOA × TT: LOA × task type.
Table 6. Effects of LOA on secondary task performance.
Table 6. Effects of LOA on secondary task performance.
Management-by-ConsentManagement-by-Exception
MSDMSDtdf
Accuracy58.8632.9165.1433.81−1.2134
Reaction Time12.494.0810.063.802.38 *29
* p < 0.05.
Table 7. Correlations between dispositional and situational trust scales.
Table 7. Correlations between dispositional and situational trust scales.
HITPAS High ExpectationPAS All-or-None Thinking
CTPA Management-by-Consent0.374 *0.295−0.003
CTPA Management-by-Exception0.429 **0.2610.011
* p < 0.05 and ** p < 0.01.
Table 8. Summary of experiment characteristics.
Table 8. Summary of experiment characteristics.
Current StudyHughes et al. (2023), Exp 2 [31]
SimulatorGSE GPWR (touchscreen)GSE GPWR (touchscreen)
SampleNovice studentNovice student
CrewIndividualCrew of three
ScenarioDerived from EOP ECA-0.0Derived from EOP ECA-0.0
PanelSimilar but not identicalSimilar but not identical
TaskSimilar but not identicalSimilar but not identical
Task stepIn natural orderGrouped (by task type)
AutomationTwo LOAsNone
Table 9. Comparison of workload scales across two studies.
Table 9. Comparison of workload scales across two studies.
Hughes et al. (2023; Exp 2) [31]Current Study: Management-by-Consent
MSDMSDtdf
NASA-TLX
Global Workload30.9316.1922.4815.172.62 *104
Mental Demand39.0024.1238.1129.191.70104
Physical Demand20.7217.1917.0314.161.12104
Temporal Demand32.2520.8519.0520.743.11 **104
Effort29.0618.9421.0818.382.09 *104
Frustration33.7920.0218.9727.073.20 **104
Performance30.7720.7220.6228.552.10 *104
MRQ
Auditory Emotional40.4024.909.3022.416.89 **98.69
Auditory Linguistic69.8919.4110.1621.8415.18 **111
Manual Process51.7622.4354.9836.18−0.5364.22
Short-Term Memory71.6220.1051.4835.653.42 **60.62
Spatial Attentive66.3219.5965.4538.680.1457.25
Spatial Concentrative59.0318.7750.8035.671.4158.38
Spatial Categorical55.1520.4144.8035.531.7661.30
Spatial Emergent66.7119.9457.1437.631.5558.59
Spatial Positional67.2018.9456.2536.821.8257.71
Spatial Quantitative49.7722.3249.5038.120.0461.99
Visual Lexical69.0520.8559.5537.781.5359.90
Visual Phonetic61.8422.1427.8234.065.88 **66.26
Visual Temporal43.2821.8353.5539.66−1.5759.81
Vocal Process67.4222.2115.1827.8611.03 **111
ISA
Checking2.410.581.410.648.29 **106
Detection2.010.931.620.712.32 *106
Response2.160.531.280.518.35 **106
* p < 0.05 and ** p < 0.01.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Schreck, J.; Matthews, G.; Lin, J.; Mondesire, S.; Metcalf, D.; Dickerson, K.; Grasso, J. Levels of Automation for a Computer-Based Procedure for Simulated Nuclear Power Plant Operation: Impacts on Workload and Trust. Safety 2025, 11, 22. https://doi.org/10.3390/safety11010022

AMA Style

Schreck J, Matthews G, Lin J, Mondesire S, Metcalf D, Dickerson K, Grasso J. Levels of Automation for a Computer-Based Procedure for Simulated Nuclear Power Plant Operation: Impacts on Workload and Trust. Safety. 2025; 11(1):22. https://doi.org/10.3390/safety11010022

Chicago/Turabian Style

Schreck, Jacquelyn, Gerald Matthews, Jinchao Lin, Sean Mondesire, David Metcalf, Kelly Dickerson, and John Grasso. 2025. "Levels of Automation for a Computer-Based Procedure for Simulated Nuclear Power Plant Operation: Impacts on Workload and Trust" Safety 11, no. 1: 22. https://doi.org/10.3390/safety11010022

APA Style

Schreck, J., Matthews, G., Lin, J., Mondesire, S., Metcalf, D., Dickerson, K., & Grasso, J. (2025). Levels of Automation for a Computer-Based Procedure for Simulated Nuclear Power Plant Operation: Impacts on Workload and Trust. Safety, 11(1), 22. https://doi.org/10.3390/safety11010022

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop