A Review on Soft Error Correcting Techniques of Aerospace-Grade Static RAM-Based Field-Programmable Gate Arrays

Wang, Weihang; Li, Xuewu; Chen, Lei; Sun, Huabo; Zhang, Fan

doi:10.3390/s24165356

Open AccessReview

A Review on Soft Error Correcting Techniques of Aerospace-Grade Static RAM-Based Field-Programmable Gate Arrays

by

Weihang Wang

^1,*

,

Xuewu Li

¹,

Lei Chen

²,

Huabo Sun

¹ and

Fan Zhang

¹

Beijing Microelectronics Technology Institute, Beijing 100076, China

²

China Academy of Aerospace Electronics Technology, Beijing 100094, China

^*

Author to whom correspondence should be addressed.

Sensors 2024, 24(16), 5356; https://doi.org/10.3390/s24165356

Submission received: 26 June 2024 / Revised: 30 July 2024 / Accepted: 7 August 2024 / Published: 19 August 2024

(This article belongs to the Section Physical Sensors)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Aerospace-grade SRAM-based field-programmable gate arrays (FPGAs) used in space applications are highly susceptible to single event effects, leading to soft errors in FPGAs. Additionally, as FPGAs scale up, the difficulty of correcting soft errors also increases. This paper proposes that performing soft error sensitivity analysis on FPGAs can help target the more sensitive areas for detection and correction, thereby improving the efficiency of soft error repair. Firstly, in accordance with the dual-layer architecture of SRAM-based FPGAs, methods for the soft error sensitivity analysis of FPGA application layer resources and configuration bitstreams are reviewed. Subsequently, based on the analysis results, it also covers corresponding application layer memory scrubbing and configuration scrubbing techniques. A prospective look at emerging soft error mitigation technologies is discussed at the end of this review, supporting the development of highly reliable aerospace-grade SRAM-based FPGAs.

Keywords:

SRAM-based FPGA; sensitivity analysis; configuration scrubbing; memory scrubbing

1. Introduction

Field-programmable gate arrays (FPGAs) are extensively utilized in aerospace systems due to their reconfigurability, capability to perform complex functions, and their abundant logic, computation, and I/O resources [1]. These devices can dynamically adjust or upgrade the functions they perform depending on varying tasks and environments without the need for replacing the entire chip. Among the three common types of FPGAs available in the market (SRAM-, antifuse-, and Flash-based), SRAM-based FPGAs are particularly favored in aerospace missions for their ability to be configured almost indefinitely. However, a notable drawback of SRAM-based FPGAs is their susceptibility to errors or failures induced by radiation exposure [2,3]. Furthermore, as the scale of FPGA devices grows and the capacity of configuration bitstreams increases, the time required for reconfiguration lengthens, thereby reducing the efficiency of SEU (Single Event Upset) repairs. For example, CREME96 was run on the heavy-ion data to determine the SEU rates of Xilinx’s Kintex UltraScale+ FPGA in a geostationary orbit under solar minimum conditions, 100 mils of aluminum shielding [4]. The result shows that although the bit-wise event rates are relatively low (9.18 × 10⁻¹², 2.08 × 10⁻⁸, and 5.29 × 10⁻⁹ upsets/bit/day in configuration RAM, Flip-Flops, and BlockRAM respectively), the device-wise event rates are remarkably high (1.64 × 10⁻³, 1.44 × 10⁻², and 1.78 × 10⁻¹ upsets/device/day in configuration RAM, Flip-Flops, and BlockRAM respectively). Therefore, more efficient, advanced fault correction techniques are needed.

To enhance the efficiency of repairing SEUs in FPGAs, it is crucial to specifically protect and repair the sensitive regions within the device. Therefore, soft error sensitivity analysis is often performed on FPGAs to estimate the likelihood of SEUs. The result of analysis can also be used to determine the level of impact of different parts on the circuit functionality. For SRAM-based FPGAs, which feature a dual-layer structure (i.e., application layer and configuration layer) as shown in Figure 1 [5], the soft error sensitivity analysis can be categorized into two types: analysis targeting the sensitivity of application layer resources and analysis targeting the sensitivity of the configuration bitstream.

More efficient FPGA soft error repairing methods could be derived by analyzing the results of soft error sensitivity. Researchers can prioritize the detection and repair of more sensitive areas in the application/configuration layer, enabling the faster identification and correction of soft errors induced by single event effects. Due to the high reliability demands in aerospace missions, it is particularly crucial to promptly detect, mitigate, and correct single event upsets (SEUs) induced errors in SRAM-based FPGAs. Commonly, FPGAs are protected by radiation hardening [6,7] or techniques like dual modular redundancy [8,9] and triple modular redundancy [10,11] to mitigate SEUs. However, these measures do not correct bit flips caused by radiation particles, which can accumulate over time and eventually lead to failures [12,13]. Therefore, it is necessary to promptly repair soft errors in FPGAs by locating and correcting the error data in the configuration random access memory (CRAM) or the storage resources in the FPGA application layer. This repair technique is known as “scrubbing”, which can be categorized into “configuration scrubbing” (also called “CRAM scrubbing”) and “memory scrubbing” depending on the location of the repair. Traditional configuration scrubbing methods were sufficient, but as FPGA scales continue to expand, these traditional methods struggle to meet the efficiency requirements for soft error repair. Advanced configuration scrubbing techniques improve upon traditional methods by designing nonlinear, high-level configuration scrubbing strategies based on FPGA configuration layer sensitivity analysis or by designing dynamic refresh scheduling mechanisms in multi-task scenarios. Research on memory scrubbing, based on the sensitivity analysis of application layer resources, targets on errors of resources in the FPGA application layer. By combining both scrubbing techniques, the system’s reliability could be further enhanced.

There have already been several review articles in this field. Nidhin, T. S., et al. introduced error correcting techniques in configuration memory and mitigation techniques of routing and logic resources in the application layer [14]. Similarly, Liu, Z., et al.’s review also involves techniques including TMR and scrubbing [15]. However, these reviews have no detailed introduction about more advanced scrubbing methods, like non-linear configuration scrubbing and multi-task scrubbing scheduling, which are contained in this paper. What is more, we point out that conducting soft error sensitivity analyzing can help design more effective repairing techniques. In this paper, we comprehensively review the soft error sensitivity analysis and repairing techniques of aerospace-grade SRAM-based FPGA.

The structure of this article is as follows. Section 2 of this paper summarizes the current SRAM-based FPGA soft error sensitivity analysis techniques. Section 3 reviews FPGA configuration scrubbing techniques, including basic methods and advanced nonlinear techniques. Section 4 reviews memory scrubbing techniques. Section 5 explores the prospects of more advanced soft error correcting techniques. Section 6 concludes this paper. In addition, for the sake of clarity, the structure of this paper is graphically illustrated in Figure 2.

2. SRAM-Based FPGA Soft Error Sensitivity Analysis Techniques

Soft error sensitivity analysis techniques for SRAM-based FPGAs can be categorized into two types: analysis targeting the resources of the application layer and analysis focusing on the configuration bitstream. The former assesses the SEU sensitivity of various resources within the FPGA application layer and examines whether these resources alter the normal functionality of the circuit when affected by SEUs. The latter investigates which data within the configuration bitstream would alter the circuits defined in the application layer or even lead to erroneous circuit outputs when flipped due to SEUs. This paper provides a comprehensive analysis and summary of the findings from both types of sensitivity analyses.

2.1. Sensitivity Analysis of Application Layer Resources in SRAM-Based FPGAs

The resource-oriented soft error sensitivity analysis for SRAM-based FPGAs focuses on different types of resources within the device’s application layer, aiming to analyze their failure rates and the extent of impact on the circuits when subjected to single event effects. According to FPGA testing guidelines [16], the resources in the FPGA application layer that require testing include Digital Flip-Flops (DFFs), BlockRAM (BRAM), clock networks, interconnect resources, and Input/Output Blocks (IOBs). The specific methods of analysis are primarily divided into two types: fault injection and analytical modeling.

2.1.1. Fault Emulation-Based Soft Error Sensitivity Analysis

Fault emulation-based sensitivity analysis involves introducing errors into the FPGA application layer to emulate SEUs. This method assesses the sensitivity of resources by counting the number of soft errors in various components after the tests or by evaluating the extent of impact on the circuits when errors occur. Thus, it determines the criticality of the affected resources. The techniques for fault emulation are primarily categorized into two types: radiation experiments and hardware emulation.

Using radiation experiments for fault emulation involves exposing the FPGA under test to high-energy particles, inducing errors in the application layer resources of the FPGA. Fabero, J.C., et al. conducted neutron radiation tests with 14 MeV energy on Artix-7 FPGA devices and compared the number of soft errors in DFFs and BRAMs after the experiments (as illustrated in Figure 3) [17]. The results showed that DFFs exhibited only a few errors across ten trials, indicating low sensitivity to soft errors, and no errors were observed in BRAMs. Researchers speculated that this might be related to the inherent error detection mechanisms in BRAMs. Further studies conducted by Tsiligiannis, G., et al. on a similar FPGA device (XC7A25T) found that disabling error correcting code (ECC) significantly increased the probability of bit flips in BRAMs, supporting the aforementioned hypothesis [18]. This suggests that BRAMs are generally unaffected by SEUs due to their embedded ECC soft error detection mechanisms. For IOB resources, FZ Tazi et al. conducted proton radiation tests on Xilinx Virtex-5 and Virtex-7 series FPGAs [19] and discovered that SEEs (single event effects) induced additional delays in IOBs, with delays up to 6.2 ns and 3.8 ns in the respective devices. They also noted that although IOBs require fewer configuration bits than Configurable Logic Blocks (CLBs) and routing resources, the probability of soft errors occurring was comparable. Regarding interconnect resources, Darvishi, M., et al. focused on routing resources and the switch matrices (SMs) at the root nodes of the interconnect networks [20], describing how exposure to ionizing neutron radiation caused delay variations due to single particle disturbances. For clock networks, Guibbaud, N., and his team described a method for precisely measuring the cross section of various clock buffers in the clock network. His research conducts pulsed laser tests to achieve fully automated, highly reliable, and repeatable precise fault emulation [21].

Using hardware emulation for fault emulation involves emulating errors in data paths, control unit registers states, and internal signal states of the FPGA application layer. This allows for targeted sensitivity analysis of respective resources to soft errors. Mansour, W., introduced NETFI (NETlist Fault Injection), which allows for the modification of circuits at the netlist level to emulate errors, and these modifications can then be mapped to errors in the FPGA application layer after synthesis and implementation [22]. However, developing fault emulation tools from scratch is time-consuming and prone to bugs. To maximize the use of commercially available EDA tools, Ebrahimi, M., et al. utilized Altera’s debugging tools, the In-System Memory Content Editor (MCE) and In-System Sources and Probes (SAP) (Altera, San Jose, CA, USA), to emulate soft errors in FPGA Flip-Flops and other memory units [23]. This was achieved through highly automated fault emulation methods using TCL scripts. Similarly, Rhod, E., et al. modified extracted circuit netlists, emulating errors within the FPGA after implementation [24]. This method was tested on several benchmark circuits from ITC’99, demonstrating the versatility of the proposed approach.

In the sensitivity analysis of FPGA application layer resources for soft errors, conducting radiation experiments to emulate faults can most accurately simulate the radiation conditions encountered in aerospace applications. However, this method is costly, time-consuming, and often has destructive effects on the devices, with the location of fault emulation being uncontrollable. What is more, radiation experiments can affect both the application layer and configuration layer at the same time. Also, the errors induced in the configuration layer may cause further errors in the application layer, presenting certain difficulties in conducting precise sensitivity analysis. Hardware emulation offers strong repeatability and less damage to circuits but requires the design of complex dedicated hardware to perform the emulation tasks.

2.1.2. Analytical Model-Based Soft Error Sensitivity Analysis

This type of analysis typically utilizes probabilistic or statistical theoretical models. Initially, models such as the Poisson distribution were employed to model the fault behavior of various circuit nodes (such as lookup tables, multiplexers, programmable interconnect points, etc.) under the influence of SEEs in order to estimate the soft error rate across different design nodes. Subsequently, mathematical tools like Markov chains were used to calculate the probability of error propagation, determining the likelihood that different resources will impact circuit functionality when a soft error occurs. Asadi, G., et al. calculated the error probabilities for all nodes (including MUX, PIP, LUT, clock networks, etc.) in circuits implemented in the FPGA application layer. By integrating the circuit netlist, they derived the error propagation probability in the data paths, thereby identifying the data paths more sensitive to SEUs [25]. Experiments on Xilinx XCV300 FPGA demonstrated that this estimation approach could achieve an average accuracy of 95%. Research conducted by Hogan, J. A., et al. also utilized Markov chains to analyze system reliability [26,27,28]. Mousavi, M., et al. proposed a universal computational model for SEU sensitivity [29]. This model divides the process from an SEU affecting the device to generating erroneous output into two stages: error generation in FPGA resources due to an SEU and the propagation of this error through subsequent circuits (as illustrated in Figure 4).

ε_{d e s}

represents the probability of an SEU inducing an error in FPGA resources, while the design vulnerability factor (DVF) represents the probability of this error propagating through subsequent circuits and causing output errors. The final probability of an output error induced by an SEU is the product of these two parameters:

D S = ε_{d e s} \times D V F, 0 \leq D S \leq 1

For the more advanced Xilinx UltraScale+ series of SRAM-based FPGAs, Sterpone. L., et al. utilized the VERIPlace tool to evaluate the sensitivity of various resources within the UltraScale+ FPGA application layer to SEUs [30]. This tool also simulated the internal propagation effects of SEUs within the FPGA designs to estimate the probability of errors in circuit outputs. Furthermore, the research compared the predictive outcomes of the VERIPlace tool with results from radiation experiments, validating the effectiveness of the proposed soft error sensitivity analysis methodology.

The analytical model-based analysis method reviewed above helps understand the propagation of soft errors in application layer circuits without causing damage to the circuits. However, it often provides the worst-case error probability, leading to inaccurate sensitivity analysis results. Additionally, due to the complexity of the models for soft error propagation in circuits, the analysis usually requires a significant amount of time. Currently, this analysis method is still in development.

2.2. Sensitivity Analysis of Configuration Bitstreams in SRAM-Based FPGAs

The analysis of soft error sensitivity for SRAM-based FPGA configuration bitstreams primarily investigates which bits in the FPGA configuration bitstreams can alter user-defined circuits when flipped and which bits can cause erroneous outputs in user-defined circuits upon flipping. The following sections review the classification methods for FPGA configuration bits from a criticality perspective and the methods for locating sensitive configuration bits within the bitstream.

2.2.1. Configuration Bits Classification

The distinction in criticality of configuration bits in SRAM-based FPGAs stems primarily from three factors. First, the sensitivity to single-event effects varies across different types and locations of resources within the FPGA. Second, the circuits implemented in the FPGA utilize only a subset of the FPGA resources; therefore, only the configuration data corresponding to these resources will impact the circuit upon being flipped. Third, erroneous data in D-flip-flops or BRAMs of FPGA may be flushed out after several clock cycles, which can result in bit flips in their configuration data not leading to incorrect circuit outputs [31,32]. These factors contribute to varying impacts of different configuration data on the functions of circuits defined within the FPGA. Based on the degree of criticality, configuration bits can be categorized into: non-essential bits, which correspond to resources not used in the FPGA; essential bits, which configure resources that are utilized by the user-defined circuits; and critical bits, a subset of essential bits, whose flipping results in erroneous outputs of the circuit. The relationships among configuration data, non-essential bits, essential bits, and critical bits are illustrated in Figure 5.

Gear et al. further subdivided critical bits into primary critical bits and secondary critical bits based on the concept of soft error accumulation [33]. The definition of primary critical bits remains the same as critical bits described previously. Secondary critical bits, when flipped individually, do not cause erroneous outputs in the circuit. However, if another bit also flips due to a SEU, the accumulation of soft errors might lead to a circuit malfunction. This more detailed classification of configuration bits enables researchers to conduct more thorough analyses of soft error sensitivity and to design more precise and efficient soft error correcting strategies.

2.2.2. Methods for Locating Essential and Critical Bits

Locating essential and critical bits within the configuration bitstream is crucial for designing effective configuration scrubbing techniques. The identification of essential bits typically involves studying the mapping relationships between the application layer and the configuration layer of SRAM-based FPGAs, with a knowledge of which resources are being used. For instance, Le Roux et al., after studying the mapping relationship between the LUTs and the bitstream in the Xilinx Virtex-5 series of FPGAs, reported that each LUT is configured by 64 configuration bits, and these bits are evenly distributed across four frames (as depicted in Figure 6) [34].

Xilinx’s “Essential Bit Technology” is capable of identifying essential bits within the configuration bitstream [35] and can work in conjunction with SEM IP to correct single or multiple bit flips in the CRAM. This method significantly reduces the failure rate (FIT, failure in time) of FPGA. Ding, Z., et al., focusing on the Virtex-5 series of SRAM-based FPGAs from Xilinx, established a mapping relationship table between programmable points, configuration options, configuration control bits, and configuration control bit values. They also developed a fully automated, distributed high-performance analysis architecture and offset formula theory, which are utilized to rapidly generate test cases, formulate mapping tables, and verify their correctness. The processing flow of bitstream decoding is shown in Figure 7 [36]. Research on the mapping relationships in Xilinx’s Virtex-7 series of FPGAs [37,38,39] has also been conducted, along with BitFREE, a parallel analysis method for mapping relationships between the application layer and configuration layer across most FPGA models, which greatly enhances the speed of analysis while ensuring universality [40].

To identify critical bits, the common method employed is fault injection. This involves flipping a specific bit in the configuration bitstream to emulate a Single Event Upset in the configuration memory. The bit is deemed critical if the outputs of the reference circuit and the tested circuit differ after injection. Research studies [41,42] targeting the Xilinx Virtex-5 and Virtex-7 series SRAM-based FPGAs used similar approaches for critical bit selection: initially, the mapping relationship is utilized to determine the configuration bits (i.e., essential bits) corresponding to used resources based on their coordinates in the application layer. A precise fault injection system is then developed to further filter out critical bits from these essential bits. Following the identification of critical bits by fault injection method, Gear, K. W., et al. leveraged this information to design more efficient FPGA refresh techniques [33,43,44]. Specifically, ref. [33] subdivides critical bits into primary and secondary critical bits, noting that comprehensive fault injection could be excessively time-consuming. Therefore, ref. [33] employs a bit-by-bit injection approach to select primary critical bits, while a probabilistic dual-bit injection strategy is designed for secondary critical bits, allowing for the calculation of the criticality of each configuration frame at a specified confidence level.

Another method for identifying critical bits involves utilizing the previously mentioned sensitivity analysis of application layer resources. This approach identifies the resources whose failure due to soft errors leads to incorrect circuit outputs. The corresponding configuration bits in the CRAM are then determined based on the application layer to configuration layer mapping relationships. However, this method is complex and often fails to yield precise results in the selection of critical bits, making it infrequently used in the localization of critical bits.

In the context of soft error sensitivity analysis targeting configuration bitstreams, there has been significant research into establishing the mapping relationship between the FPGA application layer and the configuration layer. This facilitates the extraction of essential bits information from the circuit’s netlist file. However, most studies focused on earlier FPGA devices, such as Xilinx’s Virtex-5 and Virtex-7 series, with limited attention on later FPGA series like Xilinx’s UltraScale and UltraScale+ FPGAs. Furthermore, while many studies employ fault injection to filter out critical bits from the essential bits, this approach, feasible for pre-Virtex-7 series FPGAs, becomes impractical as FPGA scales further increase due to the extensive time required for comprehensive bit-by-bit fault injection. Hence, exploring a faster and simpler method for identifying critical bits has become a key research focus.

This section provides a comprehensive review of SRAM-based FPGA sensitivity analysis techniques targeting application layer resources and configuration bitstreams. The strengths and weaknesses of specific methods involved are summarized in Table 1.

3. SRAM-Based FPGA Configuration Scrubbing

Configuration scrubbing is an effective error correction technique for the configuration memory within SRAM-based FPGAs. In some instances, the scrubbing process requires the interruption of the FPGA normal operations: a complete bitstream is injected into the configuration layer, applications are temporarily suspended, and then reinitialized. In other cases, the FPGA can perform readback operations in the background to read and verify the data in the configuration memory without disrupting the system’s normal operation. Also scrubbing techniques can benefit from the partial dynamic reconfiguration (DPR) feature of FPGAs, allowing for the scrubbing of the erroneous configuration frames while the circuits are running. DPR allows the rewriting of a subset of configuration frames, either while user design is suspended or while user design is operating. During the FPGA design stage, FPGA design tools (such as Xilinx Vivado) are used to partition the design into multiple reconfigurable modules (RMs). For each module, independent bitstream files are generated, including the bitstream for the static region and the bitstream for the reconfigurable region. At runtime, the partial bitstream is loaded into the target area through the reconfiguration controller, enabling dynamic partial reconfiguration. Current configuration scrubbing technologies can be categorized into basic, linear scrubbing techniques and more complex, advanced scrubbing techniques.

3.1. Basic Configuration Scrubbing Techniques

Basic configuration scrubbing techniques for SRAM-based FPGAs can be classified from various perspectives. Depending on the spatial relationship between the scrubber and the FPGA, scrubbing methods can be divided into external and internal scrubbing. Scrubbing can also be classified by triggering conditions into periodic scrubbing and corrective scrubbing. In terms of granularity, it can be categorized into device-level scrubbing, frame-based scrubbing, or mixed granularity scrubbing. By the end of this section, studies on redundant configuration scrubbing are also reviewed.

3.1.1. External and Internal Scrubbing

External scrubbing mechanisms, typically implemented outside the FPGA that requires protection, are referred to as external scrubbers, as depicted in Figure 8a. These mechanisms communicate with the FPGA through interfaces such as JTAG (Joint Test Action Group) or SelectMap, monitoring and verifying the data within the configuration memory of FPGA.

Xilinx has introduced an external scrubbing mechanism in which the last two bytes of each frame in the configuration memory store a CRC value derived from the frame data. This CRC value is then compared with a newly computed frame CRC to verify whether the frame has been affected by SEU. The scrubber, as proposed in [45], acts as a bridge between the configuration memory and the FPGA being scrubbed. It employs a finite state machine to switch between states such as reading the configuration bitstream, configuring the FPGA, calculating CRC values, and comparing CRC values, thereby controlling the scrubbing process.

To address the limitation of only detecting SEUs as noted in study [45], Kumar, M., et al. implemented a 16-state finite state machine scrubber in another FPGA [46]. This scrubber reads the configuration register values and the configuration pins’ electrical signal levels of the target FPGA to detect the types of SEEs and to initiate corresponding scrubbing actions. This has enabled a more comprehensive solution for correcting soft errors in FPGAs.

In internal scrubbing, the scrubber is implemented within the target FPGA and utilizes the internal logic and routing resources of FPGA alongside the user-customized design. The scrubber accesses and manipulates the configuration data directly through the built-in Internal Configuration Access Port (ICAP) of FPGA. As depicted in Figure 8b, a portion of the configuration memory stores the configuration data for the scrubber, while another portion stores the configuration data for the user circuit. The internal scrubber performs scrubbing operations on the CRAM via the ICAP interface.

Heiner, J., et al. proposed a classical internal scrubber architecture [47], utilizing an 8-bit PicoBlaze processor [48] embedded within the configuration logic to control the scrubbing process. This setup leverages the flexibility of the processor to execute complex read-back and scrub operations. In comparison, Xilinx has developed a more generalized internal scrubbing architecture capable of correcting errors caused by single event effects or emulating single event upsets. The scrubber can be implemented using the internal programmable resources or an internal processor of FPGA, utilizing Xilinx’s FRAME_ECC primitive for error detection and correction [49].

Similarly, a scrubbing strategy presented in [50] also uses the FRAME_ECC primitive from Xilinx but consumes significantly fewer resources. Within the same FPGA model, the resources consumed by [50] are approximately 20% of those used in [47] and 39% of [49]. Xilinx’s SEM IP core provides advanced error detection, correction, and injection capabilities [51]. Also, Xilinx provide an SEU controller, a hardware module integrated in FPGAs, which is capable of error detection, correction, and reconfiguration [52]. To further accelerate the scrub rate while leveraging processor flexibility, Li, X., et al. combined internal scrubbing methods with the high-speed PCAP (Processor Configuration Access Port) interface in the Xilinx Zynq-7000 SoC series, significantly enhancing scrub speeds [53]. Additionally, Lu, Y., et al. tailored their approach for scenarios involving external processors cooperating with FPGAs [54]. Recognizing that processors in high-radiation environments often operate at frequencies significantly lower than RAM [55], they set the RAM in FPGA to operate at twice the frequency of the processor. The control module acts as a bridge between the scrubber and the BRAM for half of the time, and switches to scrubbing the FPGA during the other half. The control logic adjusts the sequence of external operations and scrubbing activities to prevent conflicts. This method enables both operations to appear simultaneous, ensuring that the scrubbing process introduces no additional delays.

In comparison, external scrubbing methods can protect the logic circuits controlling the scrubbing from SEU but will occupy additional on-board area and introduce extra latency. Internal scrubbing mechanisms have shorter latency and need no extra on-board area since the scrubbing control circuit is also located within the FPGA being scrubbed. However, the control circuit is also susceptible to SEU. In practice, a trade-off between performance and reliability is needed to select the most suitable scrubbing mechanism.

3.1.2. Periodic Scrubbing and Corrective Scrubbing

Periodic scrubbing and corrective scrubbing represent two different methods of initiating scrub operations, differentiated by their triggering conditions. Periodic scrubbing, also known as blind scrubbing, involves rewriting all configuration data stored externally back into the configuration memory of FPGA at fixed time intervals. The advantages of this method include fast recovery, simple control, and minimal additional area usage. However, it requires a highly reliable and radiation-insensitive scrub controller and significant storage space to maintain standard configuration bitstreams. Determining the optimal scrubbing interval to maximize efficiency can also be challenging. To address these issues, Ahmed, A., et al. proposed a blind scrubbing strategy based on Magneto-resistive Random Access Memory (MRAM) [56]. In this approach, configuration data are stored in an MRAM matrix, allowing data selection by switching matrix rows and columns during the scrubbing process, thereby eliminating the need for additional scrub controllers and standard configuration bitstream, also enhancing the scrubbing speed.

Corrective scrubbing is more advanced and flexible compared to periodic scrubbing, as it is triggered only upon detection of SEEs. The specific detection mechanisms vary with different methods. For single event functional interrupts (SEFI), ref. [46] identifies various types of SEFIs by monitoring the values of internal FPGA configuration registers and the electrical levels of external pins, subsequently implementing repairing measures such as power cycling or reconfiguration. For SBU and multiple bit upsets (MBU), high-speed SRAM-type FPGAs typically employ single error correction and double error detection (SEC-DED) codes to verify each frame. The configuration data in each frame are encoded using an H matrix, and the resulting code is written into reserved parity bits within the configuration frame [57,58]. Single bit flips can be directly corrected through decoding based on the H matrix. For multiple bit flips, the H matrix generates eigenvalues that indicate the error locations. Corrective scrubbing uses parity codes and these eigenvalues to detect bit flips, allowing for either a full device scrub or targeted scrubbing of the affected configuration frames.

The corrective scrubbing method utilizing SEC-DED codes faces three primary challenges. First, SEC-DED codes cannot correct double bit flips, and they may misjudge scenarios involving more than two flips. Second, this approach requires the reservation of parity bits within the configuration frames. Third, the encoding process of SEC-DED codes introduces significant delays, which can impact FPGA performance. To address these shortcomings, research has explored alternative parity codes for corrective scrubbing of FPGAs. For example, Rahul, K., et al. proposed an ECC that does not require additional parity bits and is capable of detecting and correcting adjacent 2-bit errors, as well as detecting adjacent 3-bit errors [59]. Additionally, they introduced an ECC that does require extra parity bits but can detect and correct adjacent 2 and 3-bit errors. To tackle the issue of encoding delays, Sen, P., et al. improved upon the decimal matrix code (DMC) [60], significantly reducing the time required for encoding while requiring fewer additional bits of information and handling a broader range of MBU errors.

3.1.3. Device-Level Scrubbing, Frame-Based Scrubbing, and Mixed Granularity Scrubbing

From the perspective of scrubbing granularity, refreshing the entire device and refreshing by frame represent two contrasting methods. The former involves rewriting all the gold configuration data, stored in externally radiation-hardened memory, back into the configuration memory of FPGA upon detection of SEU or MBU. This approach is relatively straightforward but introduces additional and unnecessary delays and power consumption. It also disrupts the normal functions of the FPGA. In contrast, frame-based scrubbing offers more flexibility but has a comparatively complex control mechanism. The scrubber determines the occurrence of bit flips by reading the ECCs of each frame. If a flip is detected, the correct configuration data is rewritten into the affected frame. This process leverages the dynamic reconfiguration capabilities of FPGA to perform the correction without interrupting the FPGA’s normal operations.

The granularity of scrubbing is not fixed. Research presented in [61] introduced a hybrid granularity scrubbing method. This method initiates a fine-grained scrubbing mechanism when soft errors are detected in the CRAM, specifically refreshing the affected configuration frames. The coarse-grained scrubbing mechanism, on the other hand, refreshes configuration bits corresponding to a particular circuit module or the entire configuration bitstream. This approach is capable of detecting and correcting errors that cannot be identified or repaired through fine-grained repairs alone. The hybrid granularity scrubbing method addresses several issues: it reduces the time-consuming process of locating erroneous frames and fixes certain types of errors that fine-grained scrubbing cannot. Simultaneously, it resolves the low efficiency of coarse-grained scrubbing repairs. This method is particularly advantageous in large-scale FPGAs with millions or even hundreds of millions of gates.

3.1.4. Redundant Configuration Scrubbing

Most of the configuration scrubbing techniques need a “golden bitstream” to be stored in a radiation-hardened memory. However, this cannot 100% prevent the golden bitstream being affected by SEU. For techniques that rely on error-correcting codes and thus do not require a golden bitstream, MBUs remain a challenging issue. To address this issue, scrubbing techniques based on “redundant configuration” were developed. Redundant configuration can be further categorized into “bit-level”, “frame-level”, and “device-level” redundancy.

Steiner, G. C., et al. introduced a design implementation flow aimed at generating redundant configuration at the bit-level in a programmable logic device (PLD) including FPGAs [62]. The configuration data are stored in the first portion of the configuration memory cells of the PLD, defining the logic function. Then, the unprogrammed configuration memory cells are identified. By programming the unprogrammed cells with the same configuration data as their corresponding programmed cells, this method implements bit-level redundancy. Homologous bits are the inputs of a majority voter, and the voter’s output determines the actual behavior of the PLD. This technique, disclosed in a patent [62], masks errors but is unable to detect or correct them.

Tonfat, J., et al. introduced a self-correcting method named “frame-level redundancy scrubbing (FLR-scrubbing)” [63]. Based on a coarse TMR design in FPGA, FLR-scrubbing involves replicating the target configuration frames cluster three times and storing them in three TMR domains in the CRAM of a single FPGA. The scrubber starts from the first frame of each frame cluster, and then executes majority vote to detect and correct the faulty configuration bits. The energy consumption is six times lower compared to the blind scrubbing technique. Similar frame-level redundancy scrubbing techniques are also proposed in [64,65]. Furthermore, a detailed solution for generating frame-level configuration redundancy is proposed in [66].

The device-level redundant configuration scrubbing involves implementing the same design in multiple identical FPGAs. Giordano, R., et al. introduced an error-correcting technique implemented in a system with six identical FPGAs [67]. By reading back and voting frames of the same address in the CRAMs, this method can realize rapid error correction and high reliability. Similarly, Herrera-Alzu, I., et al. and Alfke, P. H., et al. also proposed device-level redundancy methods, utilizing TMR principles [68,69].

3.2. Advanced Configuration Scrubbing Techniques

Basic scrubbing techniques typically employ fixed scrubbing sequences and frequencies, using static scheduling algorithms when faced with multiple scrubbing requests. This approach limits the exploration space for mean time to detect (MTTD) or mean time to repair (MTTR) and makes it challenging to optimally schedule scrubs in response to dynamically changing scrubbing requests, thereby compromising system reliability.

This section introduces advanced scrubbing techniques including nonlinear configuration scrubbing and multitasking scrub scheduling. Compared to basic scrubbing methods, these advanced techniques significantly enhance scrubbing efficiency and system reliability.

3.2.1. Nonlinear Configuration Scrubbing

The analysis of soft error sensitivity with respect to configuration bitstreams shows that different configuration data vary in their impact on circuits implemented in FPGAs. Therefore, during the scrubbing process, it is advisable to prioritize frames containing essential or critical bits. These frames should be inspected and scrubbed first or subjected to higher detection frequencies, leading to the concept of “nonlinear scrubbing”. Nonlinear scrubbing breaks away from the traditional approach of basic scrubbing techniques, which start from the first frame and incrementally check each subsequent frame address. By altering the starting frame address and employing jumping read-back verification, nonlinear scrubbing explores further enhancements in scrubbing efficiency.

Nazar, G. L., et al. proposed the “Shifted Scrubbing” [43] method, which utilizes the non-uniform distribution of critical bits within the configuration bitstream—namely, the differing number of critical bits in each frame. This characteristic allows each frame’s proportion of critical bits in the total number of critical bits within the CRAM to represent the probability of a critical bit flip (a flip affecting circuit output) occurring in that frame. Based on this, the MTTR can be calculated under different starting frame addresses, from which the optimal starting frame can be derived. Figure 9 from [43] illustrates the distribution of critical bits across different frames in the benchmark circuit misex3. However, this study only altered the starting position of the scrubbing, and the read-back verification during the scrubbing process still followed a linear increase in frame addresses.

Building on the work of [43], Mousavi, M., et al. from Eindhoven University of Technology proposed the “Scatter Scrubbing” method [44]. This approach involves three steps. Firstly, identifying critical bits through fault injection and calculating frame criticality. Secondly, using this criticality data to develop a heuristic algorithm for the optimal grouping of configuration frames. And finally, applying nonlinear discrete optimization techniques to mathematically derive the MTTR formula and determine the optimal scrubbing sequence. Experimental results indicate that scatter scrubbing can reduce MTTR by an average of 40% and 25% compared to traditional read-back scrubbing and shifted scrubbing, respectively. The advantage of this method lies in its substantial exploration of enhanced scrubbing efficiency. However, its drawback includes additional delays introduced during frame address jumping (in the case of Xilinx 7 series FPGA, this delay is approximately 1.5 times the read-write time of a frame, because the FPGA needs extra time to read and process the new data in the frame address register). Also, the extent of MTTR reduction can vary significantly depending on the specific circuit, which somewhat limits the performance of the scrubbing algorithm.

Rongsheng Zhang et al. have noted that the accumulation of non-critical bit errors in the configuration memory of complex FPGA circuits may lead to unpredictable issues, prompting them to propose a combination of essential frame scrubbing and full device scrubbing. By scheduling N essential frame scrubs before a full device scrub, they further enhanced the reliability of the FPGA [70].

Building on the research in Refs. [43,44,70], Kyle W. Gear further subdivided critical bits into primary critical bits (PCB) and secondary critical bits (SCB) [33] (as shown in Figure 10). They quantified the “frame criticality” of each configuration frame based on the number of primary and secondary critical bits it contains, to weight its importance during the scrubbing process. The scrubbing sequence is determined based on the calculated frame criticality. The method for calculating frame criticality is illustrated in Equations (1) to (3). Equation (1) represents the count of all SCB pairs in a particular frame. Equation (2) calculates the criticality of the frame, i.e., the proportion of this frame relative to all frames. Equation (3) standardizes the criticality calculated in Equation (2).

FrameSCBs = \sum_{j = 1}^{n} {SCB}_{Pairs} (i)

(1)

Criticality (f) = \frac{FrameSCBs (f)}{\sum_{x = 1}^{m} FrameSCBs (f)}

(2)

{Crit}_{Normalised} (f) = \frac{FrameSCBs (f)}{{FrameSCBs}_{Min}}

(3)

Similarly, He, G., from Shanghai Jiao Tong University in China categorized configuration frames into unused frames, essential frames, priority essential frames, and critical frames (as shown in Figure 11) [71]. In this study, configuration frames with a MTTM (mean time to manifest) shorter than MTTD are designated as priority essential frames, among which those with the smallest MTTM values are marked as critical frames. Frames that configure user-defined circuits and have an MTTM greater than the MTTD are categorized as essential frames. Frames not involved in circuit configuration are designated as unused frames. By implementing jumping scrubbing to achieve varying scrub detection frequencies, the scrubbing efficiency of FPGAs can be enhanced. The specific order of scrubbing can be determined through methods such as brute-force enumeration, dynamic programming algorithms, or sorting optimization algorithms.

Similar to [71], Mousavi, M., et al. also described the structure of CRAM configuration data in FPGAs from the perspective of SEU sensitivity [72] and further distinguish highly critical bits within the critical bits. They proposed more advanced shifted scrubbing and scatter scrubbing with consideration of precision levels. Experimental results showed that compared to basic read-back scrubbing methods, the MTTR was reduced by 24% to 46.5%. Additionally, for circuits with non-uniform structures such as FFT circuits (where the addresses of the configuration frames used in the CRAM are relatively concentrated), these advanced scrubbing methods achieved even smaller MTTR, confirming their effectiveness.

The nonlinear configuration scrubbing methods reviewed above are summarized in Table 2.

3.2.2. Multitasking Scrub Scheduling

Most research on scrubbing techniques has been conducted under the assumption of a single hardware task (i.e., the user-defined circuit in FPGA), with a fixed scrubbing cycle for FPGAs. However, in practice, multiple tasks can be run on a single FPGA, dividing the application layer of the FPGA into several regions, each dedicated to implementing the circuit for a corresponding task. Scrubbing measures for these hardware tasks involve only detecting and repairing the configuration frames specific to each partition (as shown in Figure 12). In systems operating in real-time, multiple tasks may occur periodically or randomly, and they can vary in criticality. Frequently scrubbing a task that is not active results in a waste of resources. Also, applying the same scrubbing frequency to tasks of high and low criticality decreases system reliability. In this scenario, it is necessary to design an optimal scheduling method for multiple scrubbing tasks to achieve the highest system reliability.

Santos, R., from the National University of Singapore pioneered the study of multi-task scrubbing scheduling in FPGAs, linking the scrubbing process with the significance and timing of hardware tasks within the FPGA [73]. He developed a heuristic scrubbing schedule design algorithm based on task criticality and execution time. This method first calculates the minimal scrubbing period using integer linear programming based on the criticality of each task. It then employs the earliest deadline first (EDF) algorithm to schedule the scrubbing tasks, arranging for the end of the scrubbing tasks to be as close as possible to the commencement of the corresponding hardware tasks to minimize the likelihood of SEU impacts on the system. Experiments conducted in a multi-hardware task scenario compared this scheduling algorithm with traditional blind scrubbing and selective scrubbing approaches. The results indicated that compared to the blind scrubbing method, system reliability increased by 70–79%, and relative to selective scrubbing, reliability improved by 25–34%.

Although [73] effectively linked scrubbing tasks with hardware tasks to enhance system reliability in multi-task scenarios, the scrubbing scheduling algorithm employed was static, meaning it was calculated offline and could not adapt to the addition of new hardware tasks. Moreover, it required substantial space to store the pre-computed scrubbing schedule. To address these issues, Santos, R., further developed a dynamic adaptive scrubbing scheduling algorithm for multi-task scenarios [74], based on a time-window approach. This method divides time into a series of consecutive, fixed-length segments (i.e., “windows”), during which the system plans and executes scrubbing operations based on the dynamic changes of system status and hardware tasks. Experimental results demonstrated that this method could adapt to scenarios where new hardware tasks are added to the task set. System reliability was comparable to that of [73]. However, as it only required storage for the upcoming few windows rather than for the entire duration of FPGA operation, the required storage space (averaging 1.3 KBytes) was significantly reduced compared to [73] (averaging 21.3 KBytes).

Building on [73,74], Santos, R., et al. expanded the scope to include DSP applications and high-data-throughput tasks such as video processing and data acquisition [75]. Additional challenges in [73,74] include the limited dynamic reconfiguration capabilities of FPGAs, which reduce system reliability under the scenarios of the frequent parallel execution of hardware tasks. And for tasks of low criticality, the extended interval between two scrubbing operations makes them more susceptible to SEUs. To address these issues, Li, R., et al. from ShanghaiTech University employed dynamic voltage and frequency scaling (DVFS) to dynamically schedule hardware tasks implemented in FPGAs, aiming to reduce conflicts with scrubbing tasks [76]. Compared to [73], this scheduling algorithm improved system reliability by 15.51%.

In subsequent research, Li, R., pointed out that existing multi-task scrubbing scheduling algorithms either struggle to optimally manage conflicts among multiple scrubbing tasks in scenarios with multiple burst tasks or allocate low scrubbing frequencies to low-criticality user tasks, thereby reducing system reliability. To address these challenges, Li, R., proposed a negotiation-driven scrubbing scheduling algorithm [77], which temporarily tolerates conflicts between scrubbing tasks and resolves them iteratively, thereby enhancing the flexibility of the scheduling process and the reliability of the system. A logical probability model was developed to prevent potential scheduling starvation issues. Furthermore, utilizing the hardware characteristics of FPGAs, a DVFS-based multi-ICAP allocation algorithm was designed to maximize system reliability. Experimental results on a Xilinx Virtex-6 FPGA demonstrated that, compared to previous scrubbing scheduling methods, this approach could increase system reliability by up to 31.46%.

Table 3 provides a summary of the research for multi-task scrubbing scheduling techniques.

4. Memory Scrubbing Techniques

When soft errors occur in FPGA application layer storage units such as BRAMs and LUTs due to SEEs, the most direct method for error correction is configuration scrubbing. However, this method leads to interruptions in system functionality and is rarely used for repairing soft errors in the application layer storage. An alternative approach involves performing configuration scrubbing on the FPGA while employing redundancy-based soft error mitigation methods for user data within the application layer [78,79]. However, this can lead to the accumulation of errors and result in unpredictable outcomes. Consequently, there is a need for a method that specifically refreshes the user data stored in the FPGA application layer, termed “memory scrubbing.”

Rollins. N., et al. noted that memory scrubbing and configuration scrubbing are two fundamentally different soft error correcting techniques. Due to the dynamic nature of user data in the application layers of FPGAs, it is impractical to use golden data to detect errors in BRAMs or LUTs [80]. Further, Rollins, N., has proposed specific memory scrubbing techniques for different storage units within the FPGA application layer, including BRAM, LUTRAM, and Shift Register LUT (SRL). For BRAM, which has a larger storage capacity and is susceptible to errors at multiple locations due to SEUs, it is necessary to perform regular scrubbing. The method described in [80] utilizes triple modular redundancy or ECC in BRAM supported by FPGA vendors to detect errors and scrubs the data by writing through one of the BRAM ports. In contrast, smaller units like LUTRAM and SRL are scrubbed only during use. It is worth noting that accessing BRAM configuration conflicts with the operation of logic in the fabric accessing the pertaining memory blocks. Therefore, care should be taken to resolve these conflicts for successful operation of the design and of the configuration scrubber.

The same author also investigates the reliability of the LEON3 (Atmel, San Jose, CA, USA) soft-core processor implemented in SRAM-based FPGAs, proposing a hybrid soft error mitigation strategy that combines parity, redundancy, checkpointing, and memory scrubbing [81]. Similarly, Wirthlin, M. J., et al. suggests a method of using triple modular redundancy combined with scrubbing to mitigate soft errors in the FPGA-based LEON3 processor [82]. This approach involves alternating read and write operations at each address of the BRAM to detect and repair potential soft errors, as illustrated in Figure 13. Both fault injection tests and irradiation experiments confirm that the proposed memory scrubbing technique enhances system reliability.

For BRAM memory scrubbing, studies [80,81,82] all rely on utilizing one of the BRAM’s ports to write correct data back to the corresponding addresses. This method is unfeasible when the BRAM is configured in single-port mode or when both ports are utilized in dual-port mode. Gomez-Cornejo, J., et al. have introduced an innovative approach for modifying BRAM data, which allows the contents of the BRAM to be extracted, loaded, or compared directly from the bitstream without utilizing the BRAM ports [83]. This technique, grounded in a thorough understanding of the configuration bitstream’s organizational structure, analyzes the addresses corresponding to BRAM data within the bitstream. By directly modifying the data at these specific locations within the configuration bitstream, BRAM data can be altered using dynamic reconfiguration techniques.

Based on [83], Gomez-Cornejo, J., targeted the Xilinx Zynq SoC, employing the SoC’s internal hard-core processor to analyze the bitstream and control the BRAM memory scrubbing process [84]. For distributed storage units within the FPGA application layer, the research employs the GSR signal within the STARTUPE2 primitive to reload initial values and repair potential soft errors. Experimental results indicate that this method reduces the BRAM scrubbing time to approximately 22.3% of that required by traditional BRAM scrubbing methods. Similarly, the scrubbing of distributed storage units is also exceptionally fast, approximately 0.4 μs, achieving trhe rapid repair of soft errors in the FPGA application layer.

5. Challenges and Future Directions

To date, extensive research has been conducted on the analysis and correcting techniques for soft errors in SRAM-based FPGAs. However, due to the complexity of FPGA devices, it is challenging for a sensitivity analysis to cover all resources comprehensively, and the accuracy of such analyses can be uncertain. Furthermore, as FPGA technology continues to evolve and application layers expand to billions of gates, the corresponding configuration bitstream size increases, thereby elevating the difficulty of detecting and correcting soft errors.

In summary, the journey toward fully understanding and mitigating soft errors in FPGAs remains long and complex. Based on the current research foundation, future trends in this field can be summarized as follows:

(1): Develop more detailed and accurate models for soft error analysis in FPGA application layers. Current technologies struggle to accurately emulate the operational processes and states of circuits during functioning. Consequently, it is difficult to precisely determine which resources have a more significant impact on circuit performance when soft errors occur. Moreover, conducting more refined emulations requires substantial computational resources and time. Therefore, developing more accurate models for soft error analysis and enhancing the speed of such analyses represent one of the future directions for research in this field.
(2): For memory scrubbing techniques, expand coverage to a broader range of storage units. Compared to configuration scrubbing, research on memory scrubbing is relatively scarce and has mostly focused on major storage units in the application layer, such as BRAM and LUTs. However, there has been limited research on the correction of soft errors in crucial but less data-intensive storage resources, such as control registers, which are critical for operation. Developing methods to detect and promptly correct soft errors in these key registers may become a focal point in future research on FPGA soft error mitigation techniques.
(3): Incorporate artificial intelligence into the mitigation, detection, and repair of soft errors in FPGAs. The task of optimizing the layout of user circuits within FPGAs to minimize sensitivity to SEE is a heuristic problem with a vast variable space, making it challenging to find optimal solutions. Artificial intelligence is well-suited for solving such high-dimensional data space issues [85]. AI can be utilized to analyze designs within FPGAs, identify vulnerabilities in the circuits, and suggest appropriate improvements.

6. Conclusions

As the design technology of SRAM-based FPGAs continues to mature, the increasing scale and complexity of FPGAs have increased demands for soft error correction techniques in aerospace applications. This paper focuses on the soft error correction techniques for FPGAs, analyzing and summarizing existing research from two perspectives: FPGA soft error sensitivity analysis and correction techniques. It demonstrated that the results of FPGA soft error sensitivity analysis contribute to the design of more efficient correction methods. Finally, based on the current technical challenges, this paper forecasts the development trends of more efficient and intelligent soft error correction techniques, aiming to provide valuable insights for future research in this field.

Author Contributions

Conceptualization, W.W., X.L., and L.C.; methodology, F.Z.; investigation, W.W. and H.S.; resources, X.L.; writing—original draft preparation, W.W.; writing—review and editing, W.W.; supervision, W.W.; project administration, W.W. and L.C.; funding acquisition, X.L. and F.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

No data were used for the research described in the article.

Acknowledgments

The authors are grateful to Jing Wu for his valuable advice and help in the paper preparation and writing.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Zhang, R.S. Research on Fault Injection and Scrubbing Technique for SRAM-Based FPGA. Ph.D. Dissertation, Harbin Institute of Technology, Harbin, China, 2019. [Google Scholar]
Nicolaidis, M. (Ed.) Soft Errors in Modern Electronic Systems; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2010. [Google Scholar]
Petersen, E. Single Event Effects in Aerospace; John Wiley & Sons: Hoboken, NJ, USA, 2011. [Google Scholar]
Lee, D.S.; King, M.; Evans, W.; Cannon, M.; Pérez-Celis, A.; Anderson, J.; Wirthlin, M.; Rice, W. Single-event characterization of 16 nm FinFET Xilinx UltraScale+ devices with heavy ion and neutron irradiation. In Proceedings of the 2018 IEEE Radiation Effects Data Workshop (REDW), San Antonio, TX, USA, 16 July 2018; pp. 1–8. [Google Scholar]
Siegle, F.; Vladimirova, T.; Ilstad, J.; Emam, O. Mitigation of radiation effects in SRAM-based FPGAs for space applications. ACM Comput. Surv. (CSUR) 2015, 47, 1–34. [Google Scholar] [CrossRef]
Rockett, L.; Patel, D.; Danziger, S.; Cronquist, B.; Wang, J. Radiation hardened FPGA technology for space applications. In Proceedings of the 2007 IEEE Aerospace Conference, Big Sky, MT, USA, 3 March 2007; pp. 1–7. [Google Scholar]
McCollum, J. ASIC versus antifuse FPGA reliability. In Proceedings of the 2009 IEEE Aerospace Conference, Big Sky, MT, USA, 7–14 March 2009; pp. 1–11. [Google Scholar]
Trefzer, M.A.; Tyrrell, A.M. Improved fault-tolerance through dynamic modular redundancy (DMR) on the RISA FPGA platform. In Proceedings of the 2014 NASA/ESA Conference on Adaptive Hardware and Systems (AHS), Leicester, UK, 14–17 July 2014; pp. 39–46. [Google Scholar]
Huang, Z.J. Fault-tolerant design of SRAM FPGA register based on dual-mode redundancy structure. In Proceedings of the International Conference on Artificial Intelligence and Communication Technologies (AICT 2020), Tashkent, Uzbekistan, 7–9 October 2020; pp. 252–259. [Google Scholar]
Wirthlin, M.J. Improving the reliability of FPGA circuits using triple-modular redundancy (TMR) & efficient voter placement. In Proceedings of the 2004 ACM/SIGDA 12th International Symposium on Field Programmable Gate Arrays, Monterey, CA, USA, 22–24 February 2004; p. 252. [Google Scholar]
Keller, A.M. Partial Circuit Replication for Masking and Detecting Soft Errors in SRAM-Based FPGAs. Ph.D. Thesis, Brigham Young University, Provo, UT, USA, 2021. [Google Scholar]
Zhang, R.S.; Xiao, L.Y.; Cao, X.B.; Li, J.; Li, J.Q.; Li, L.Z. A fast scrubbing method based on triple modular redundancy for SRAM-Based FPGAs. In Proceedings of the 2018 14th IEEE International Conference on Solid-State and Integrated Circuit Technology (ICSICT), Qingdao, China, 31 October–3 November 2018; pp. 1–3. [Google Scholar]
Adell, P.; Allen, G.; Swift, G.; McClure, S. Assessing and mitigating radiation effects in Xilinx SRAM FPGAs. In Proceedings of the 2008 European Conference on Radiation and Its Effects on Components and Systems, Jyvaskyla, Finland, 10–12 September 2008; pp. 418–424. [Google Scholar]
Nidhin, T.S.; Bhattacharyya, A.; Behera, R.P.; Jayanthi, T. A review on SEU mitigation techniques for FPGA configuration memory. IETE Tech. Rev. 2018, 35, 157–168. [Google Scholar] [CrossRef]
Liu, Z.; Lu, Z.; Huang, L.; Yao, Z.; Lu, Z.; Zhang, J. Recent advances on reliability of FPGAs in a radiation environment. Microelectron. J. 2024, 148, 106176. [Google Scholar] [CrossRef]
Berg, M. Field Programmable Gate Array (FPGA) Single Event Effect (SEE) Radiation Testing; NASA: Washington, DC, USA, 2012. [Google Scholar]
Fabero, J.C.; Mecha, H.; Franco, F.J.; Clemente, J.A.; Korkian, G.; Rey, S.; Cheymol, B.; Baylac, M.; Hubert, G.; Velazco, R. Single event upsets under 14 MeV neutrons in a 28 nm SRAM-based FPGA in static mode. IEEE Trans. Nucl. Sci. 2020, 67, 1461–1469. [Google Scholar] [CrossRef]
Tsiligiannis, G.; Danzeca, S.; Alía, R.G.; Infantino, A.; Lesea, A.; Brugger, M.; Masi, A.; Gilardoni, S.; Saigné, F. Radiation effects on deep submicrometer SRAM-based FPGAs under the CERN mixed-field radiation environment. IEEE Trans. Nucl. Sci. 2018, 65, 1511–1518. [Google Scholar] [CrossRef]
Tazi, F.Z.; Thibeault, C.; Savaria, Y.; Pichette, S.; Audet, Y. On extra delays affecting I/O blocks of an SRAM-based FPGA due to ionizing radiation. IEEE Trans. Nucl. Sci. 2014, 61, 3138–3145. [Google Scholar] [CrossRef]
Darvishi, M.; Audet, Y.; Blaquière, Y.; Thibeault, C.; Pichette, S. On the susceptibility of sram-based fpga routing network to delay changes induced by ionizing radiation. IEEE Trans. Nucl. Sci. 2019, 66, 643–654. [Google Scholar] [CrossRef]
Guibbaud, N.; Miller, F.; Colladant, T. Single-Event Transient (SET) sensitivity into the Clock Networks of FPGAs. In Proceedings of the 2021 21th European Conference on Radiation and Its Effects on Components and Systems (RADECS), Vienna, Austria, 13–17 September 2021; pp. 1–7. [Google Scholar]
Mansour, W.; Velazco, R. An automated SEU fault-injection method and tool for HDL-based designs. IEEE Trans. Nucl. Sci. 2013, 60, 2728–2733. [Google Scholar] [CrossRef]
Ebrahimi, M.; Mohammadi, A.; Ejlali, A.; Miremadi, S.G. A fast, flexible, and easy-to-develop FPGA-based fault injection technique. Microelectron. Reliab. 2014, 54, 1000–1008. [Google Scholar] [CrossRef]
Rhod, E.; Ghavami, B.; Fang, Z.; Shannon, L. A Cycle-Accurate Soft Error Vulnerability Analysis Framework for FPGA-Based Designs. arXiv 2023, arXiv:2303.12269. [Google Scholar]
Asadi, G.; Tahoori, M.B. An analytical approach for soft error rate estimation of SRAM-based FPGAs. In Proceedings of the Military and Aerospace Applications of Programmable Logic Devices (MAPLD), Washington, DC, USA, 8–10 September 2004; pp. 2991–2994. [Google Scholar]
Hogan, J.A.; Weber, R.J.; LaMeres, B.J. Reliability analysis of field-programmable gate-array-based space computer architectures. J. Aerosp. Inf. Syst. 2017, 14, 247–258. [Google Scholar] [CrossRef]
Hoque, K.A.; Mohamed, O.A.; Savaria, Y. Dependability modeling and optimization of triple modular redundancy partitioning for SRAM-based FPGAs. Reliab. Eng. Syst. Saf. 2019, 182, 107–119. [Google Scholar] [CrossRef]
Rajkumar, T.; Öberg, J. A Markovian Approach for Detecting Failures in the Xilinx SEM core. In Proceedings of the 2022 International Conference on Field-Programmable Technology (ICFPT), Hong Kong, China, 5–9 December 2022; pp. 1–4. [Google Scholar]
Mousavi, M.; Pourshaghaghi, H.R.; Tahghighi, M.; Jordans, R.; Corporaal, H. A generic methodology to compute design sensitivity to SEU in SRAM-based FPGA. In Proceedings of the 2018 21st Euromicro Conference on Digital System Design (DSD), Prague, Czech Republic, 29–31 August 2018; pp. 221–228. [Google Scholar]
Sterpone, L.; Azimi, S.; Bozzoli, L.; Du, B.; Lange, T.; Glorieux, M.; Alexandrescu, D.; Polo, C.B.; Codinachs, D.M. A Novel Error Rate Estimation Approach for UltraScale+ SRAM-based FPGAs. In Proceedings of the 2018 NASA/ESA Conference on Adaptive Hardware and Systems (AHS), Edinburgh, UK, 6–9 August 2018; pp. 120–126. [Google Scholar]
Luckenbill, S.; Lee, J.Y.; Hu, Y.; Majumdar, R.; He, L. RALF: Reliability analysis for logic faults—An exact algorithm and its applications. In Proceedings of the 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010), Dresden, Germany, 8–12 March 2010; pp. 783–788. [Google Scholar]
Cong, J.; Minkovich, K. LUT-based FPGA technology mapping for reliability. In Proceedings of the 47th Design Automation Conference, Anaheim, CA, USA, 13–18 June 2010; pp. 517–522. [Google Scholar]
Gear, K.W.; Sánchez-Macián, A.; Maestro, J.A. An analysis of FPGA configuration memory SEU accumulation and a preventative scrubbing technique. Microprocess. Microsyst. 2022, 90, 104467. [Google Scholar] [CrossRef]
Le Roux, R.; Van Schoor, G.; Van Vuuren, P. Parsing and analysis of a Xilinx FPGA bitstream for generating new hardware by direct bit manipulation in real-time. S. Afr. Comput. J. 2019, 31, 80–102. [Google Scholar] [CrossRef]
Le, R. Soft Error Mitigation Using Prioritized Essential Bits. Available online: https://www.eeweb.com/wp-content/uploads/articles-app-notes-files-soft-error-mitigation-using-prioritized-essential-bits-1339781673.pdf (accessed on 25 June 2024).
Ding, Z.; Wu, Q.; Zhang, Y.; Zhu, L. Deriving an NCD file from an FPGA bitstream: Methodology, architecture and evaluation. Microprocess. Microsyst. 2013, 37, 299–312. [Google Scholar] [CrossRef]
Pham, K.D.; Horta, E.; Koch, D. BITMAN: A tool and API for FPGA bitstream manipulations. In Proceedings of the Design, Automation & Test in Europe Conference & Exhibition (DATE), Lausanne, Switzerland, 27–31 March 2017; pp. 894–897. [Google Scholar]
Manev, K.; Powell, J.; Matas, K.; Koch, D. Byteman: A Bitstream Manipulation Framework. In Proceedings of the 2022 International Conference on Field-Programmable Technology (ICFPT), Hong Kong, China, 5–9 December 2022; pp. 1–9. [Google Scholar]
Bozzoli, L.; Sterpone, L. COMET: A configuration memory tool to analyze, visualize and manipulate FPGAs bitstream. In Proceedings of the ARCS Workshop 2018, 31th International Conference on Architecture of Computing Systems, Braunschweig, Germany, 9–12 April 2018; pp. 1–4. [Google Scholar]
Zhang, T.; Tehranipoor, M.; Farahmandi, F. BitFREE: On significant speedup and security applications of FPGA bitstream format reverse engineering. In Proceedings of the 2023 IEEE European Test Symposium (ETS), Venezia, Italy, 22–26 May 2023; pp. 1–6. [Google Scholar]
Yu, T. A Software Controlled FPGA Fault Injection System Based on Bitstream. Microeletronics 2017, 47, 553–556. [Google Scholar]
Feng, H.; Li, W.; Chen, L.; Wang, S.; Zhou, J.; Tian, C.; Zhang, Y. Precise Fault Injection and Fault Location System for SRAM-based FPGAs. In Proceedings of the 2022 IEEE 10th Joint International Information Technology and Artificial Intelligence Conference (ITAIC), Chongqing, China, 17–19 June 2022; pp. 2371–2376. [Google Scholar]
Nazar, G.L.; Santos, L.P.; Carro, L. Accelerated FPGA repair through shifted scrubbing. In Proceedings of the 2013 23rd International Conference on Field programmable Logic and Applications, Porto, Portugal, 2–4 September 2013; pp. 1–6. [Google Scholar]
Mousavi, M.; Pourshaghaghi, H.R.; Corporaal, H.; Kumar, A. Scatter scrubbing: A method to reduce SEU repair time in FPGA configuration memory. In Proceedings of the 2019 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT), Noordwijk, The Netherlands, 2–4 October 2019; pp. 1–6. [Google Scholar]
Carmichael, C.; Caffrey, M.; Salazar, A. Correcting Single-Event Upsets through Virtex Partial Configuration. Available online: https://docs.amd.com/v/u/en-US/xapp216 (accessed on 25 June 2024).
Kumar, M.; Digdarsini, D.; Misra, N.; Ram, T.V. SEU mitigation of Rad-Tolerant Xilinx FPGA using external scrubbing for geostationary mission. In Proceedings of the 2016 IEEE Annual India Conference (INDICON), Bangalore, India, 16–18 December 2016; pp. 1–6. [Google Scholar]
Heiner, J.; Collins, N.; Wirthlin, M. Fault tolerant ICAP controller for high-reliable internal scrubbing. In Proceedings of the 2008 IEEE Aerospace Conference, Big Sky, MT, USA, 1–8 March 2008; pp. 1–10. [Google Scholar]
Ali, E.; Pora, W. A Deterministic Branch Prediction Technique for a Real-Time Embedded Processor Based on PicoBlaze Architecture. Electronics 2022, 11, 3438. [Google Scholar] [CrossRef]
Chapman, K. SEU Strategies for Virtex-5 Devices. Available online: https://www.eeweb.com/wp-content/uploads/articles-app-notes-files-seu-strategies-for-virtex-5-devices-1340818744.pdf (accessed on 25 June 2024).
Legat, U.; Biasizzo, A.; Novak, F. SEU recovery mechanism for SRAM-based FPGAs. IEEE Trans. Nucl. Sci. 2012, 59, 2562–2571. [Google Scholar] [CrossRef]
LogiCORE IP Soft Error Mitigation Controller v3.1 User Guide. Available online: https://docs.amd.com/v/u/en-US/ug764_sem (accessed on 25 June 2024).
Nunes, J.L.; Cunha, J.C.; Barbosa, R.; Zenha-Rela, M. Evaluating xilinx SEU controller macro for fault injection. In Proceedings of the 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2013), Budapest, Hungary, 24–27 June 2013; pp. 1–2. [Google Scholar]
Li, X.; Lou, H.; Jin, Z. A fault-tolerant method of SRAM FPGA based on processor scrubbing. In Proceedings of the 2021 IEEE 5th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), Chongqing, China, 12–14 March 2021; pp. 1024–1028. [Google Scholar]
Lu, Y.; Zhai, X.; Saha, S.; Ehsan, S.; McDonald-Maier, K.D. A self-adaptive SEU mitigation scheme for embedded systems in extreme radiation environments. IEEE Syst. J. 2022, 16, 1436–1447. [Google Scholar] [CrossRef]
Microprocessors in ESA Projects. [EB/OL]. Available online: https://www.esa.int/Enabling_Support/Space_Engineering_Technology/Onboard_Computers_and_Data_Handling/Microprocessors (accessed on 25 June 2024).
Ahmed, A. New FPGA blind scrubbing technique. In Proceedings of the 2016 IEEE Aerospace Conference, Big Sky, MT, USA, 5–12 March 2016; pp. 1–9. [Google Scholar]
Samanta, J.; Bhaumik, J.; Barman, S. Compact and power efficient SEC-DED codec for computer memory. Microsyst. Technol. 2021, 27, 359–368. [Google Scholar] [CrossRef]
Bhargavi, C.; Nishanth, D.V.R.; Nikhita, P.; Vinodhini, M. H-matrix based error correction codes for memory applications. In Proceedings of the 2021 International Conference on Advances in Electrical, Computing, Communication and Sustainable Technologies (ICAECT), Bhilai, India, 19–20 February 2021; pp. 1–5. [Google Scholar]
Rahul, K.; Yachareni, S. Area and power efficient ECC for multiple adjacent bit errors in SRAMs. In Proceedings of the 2020 IEEE International Conference on Consumer Electronics (ICCE), Las Vegas, NV, USA, 4–6 January 2020; pp. 1–4. [Google Scholar]
Sen, P.; Sadi, M.S.; Ashab, N.; Rossi, D. A New Error Correcting Coding Technique to Tolerate Soft Errors. In Proceedings of the 2021 International Conference on Electronics, Communications and Information Technology (ICECIT), Khulna, Bangladesh, 14–16 September 2021; pp. 1–4. [Google Scholar]
Xie, Y.; Qiao, T.; Xie, Y.; Chen, H. Soft error mitigation and recovery of SRAM-based FPGAs using brain-inspired hybrid-grained scrubbing mechanism. Front. Comput. Neurosci. 2023, 17, 1268374. [Google Scholar] [CrossRef] [PubMed]
Steiner, G.C. Method and Apparatus for Error Mitigation of Programmable Logic Device Configuration Memory. U.S. Patent 7,236,000, 26 June 2007. [Google Scholar]
Tonfat, J.; Kastensmidt, F.; Reis, R. Energy efficient frame-level redundancy scrubbing technique for SRAM-based FPGAs. In Proceedings of the 2015 NASA/ESA Conference on Adaptive Hardware and Systems (AHS), Montreal, QC, Canada, 15–18 June 2015; pp. 1–8. [Google Scholar]
Giordano, R.; Perrella, S.; Izzo, V.; Milluzzo, G.; Aloisio, A. Redundant-configuration scrubbing of SRAM-based FPGAs. IEEE Trans. Nucl. Sci. 2017, 64, 2497–2504. [Google Scholar] [CrossRef]
Tonfat, J.; Kastensmidt, F.L.; Rech, P.; Reis, R.; Quinn, H.M. Analyzing the effectiveness of a frame-level redundancy scrubbing technique for SRAM-based FPGAs. IEEE Trans. Nucl. Sci. 2015, 62, 3080–3087. [Google Scholar] [CrossRef]
Giordano, R. Method for Generating Redundant Configuration in FPGAs. U.S. Patent 11,106,849, 31 August 2021. [Google Scholar]
Giordano, R.; Lai, Y.; Korpar, S.; Pestotnik, R.; Lozar, A.; Šantelj, L.; Shoji, M.; Nishida, S. Frame-level intermodular configuration scrubbing of on-detector FPGAs for the ARICH at Belle II. IEEE Trans. Nucl. Sci. 2021, 68, 2810–2817. [Google Scholar] [CrossRef]
Herrera-Alzu, I.; López-Vallejo, M. Self-reference scrubber for TMR systems based on xilinx virtex FPGAs. In Integrated Circuit and System Design. Power and Timing Modeling, Optimization, and Simulation, Proceedings of the 21st International Workshop, PATMOS 2011, Madrid, Spain, 26–29 September 2011; Springer: Berlin/Heidelberg, Germany, 2011; pp. 133–142. [Google Scholar]
Alfke, P.H. System for Preventing Radiation Failures in Programmable Logic Devices. U.S. Patent 6,104,211, 15 August 2000. [Google Scholar]
Zhang, R.; Xiao, L.; Li, J.; Cao, X.; Li, L. An adjustable and fast error repair scrubbing method based on Xilinx essential bits technology for SRAM-Based FPGA. IEEE Trans. Reliab. 2019, 69, 430–439. [Google Scholar] [CrossRef]
He, G.; Zheng, S.; Jing, N. A hierarchical scrubbing technique for SEU mitigation on SRAM-based FPGAs. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 2020, 28, 2134–2145. [Google Scholar] [CrossRef]
Mousavi, M.; Pourshaghaghi, H.R.; Kumar, A.; Corporaal, H. MTTR reduction of FPGA scrubbing: Exploring SEU sensitivity. Microprocess. Microsyst. 2023, 101, 104841. [Google Scholar] [CrossRef]
Santos, R.; Venkataraman, S.; Das, A.; Kumar, A. Criticality-aware scrubbing mechanism for SRAM-based FPGAs. In Proceedings of the 2014 24th International Conference on Field Programmable Logic and Applications (FPL), Munich, Germany, 2–4 September 2014; pp. 1–8. [Google Scholar]
Santos, R.; Venkataraman, S.; Kumar, A. Dynamically adaptive scrubbing mechanism for improved reliability in reconfigurable embedded systems. In Proceedings of the 52nd Annual Design Automation Conference, San Francisco, CA, USA, 7–11 June 2015; pp. 1–6. [Google Scholar]
Santos, R.; Venkataraman, S.; Kumar, A. Scrubbing mechanism for heterogeneous applications in reconfigurable devices. ACM Trans. Des. Autom. Electron. Syst. (TODAES) 2017, 22, 1–26. [Google Scholar] [CrossRef]
Li, R.; Yu, H.; Jiang, W.; Ha, Y. DVFS-based scrubbing scheduling for reliability maximization on parallel tasks in SRAM-based FPGAs. In Proceedings of the 2020 57th ACM/IEEE Design Automation Conference (DAC), San Francisco, CA, USA, 20–24 July 2020; pp. 1–6. [Google Scholar]
Li, R.; Yu, H.; Li, L.; Ha, Y. Criticality-aware negotiation-driven scrubbing scheduling for reliability maximization in sram-based fpgas. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 2023, 42, 3881–3894. [Google Scholar] [CrossRef]
Shashidhara, B.; Jadhav, S.; Kim, Y.S. Reconfigurable fault tolerant processor on a SRAM based FPGA. In Proceedings of the 2020 IEEE International Conference on Electro Information Technology (EIT), Chicago, IL, USA, 31 July–1 August 2020; pp. 151–154. [Google Scholar]
Benites, L.A.C.; Benevenuti, F.; de Oliveira, Á.B.; Kastensmidta, F.L.; Added, N.; Aguiar, V.A.; Medina, N.H.; Guazzelli, M.A. Reliability calculation with respect to functional failures induced by radiation in TMR arm cortex-M0 soft-core embedded into SRAM-based FPGA. IEEE Trans. Nucl. Sci. 2019, 66, 1433–1440. [Google Scholar] [CrossRef]
Rollins, N.; Fuller, M.; Wirthlin, M.J. A comparison of fault-tolerant memories in SRAM-based FPGAs. In Proceedings of the 2010 IEEE Aerospace Conference, Big Sky, MT, USA, 6–13 March 2010; pp. 1–12. [Google Scholar]
Rollins, N.H. Hardware and Software Fault-Tolerance of Softcore Processors Implemented in SRAM-Based FPGAs. Ph.D. Thesis, Brigham Young University, Provo, UT, USA, 2012. [Google Scholar]
Wirthlin, M.J.; Keller, A.M.; McCloskey, C.; Ridd, P.; Lee, D.; Draper, J. SEU mitigation and validation of the LEON3 soft processor using triple modular redundancy for space processing. In Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, New York, NY, USA, 21–23 February 2016; pp. 205–214. [Google Scholar]
Gomez-Cornejo, J.; Zuloaga, A.; Villalta, I.; Del Ser, J.; Kretzschmar, U.; Lázaro, J. A novel BRAM content accessing and processing method based on FPGA configuration bitstream. Microprocess. Microsyst. 2017, 49, 64–76. [Google Scholar] [CrossRef]
Gomez-Cornejo, J.; Villalta, I.; Aranzabal, I.; Lopez, I.; Zuloaga, A. Data content scrubbing approach for sram based fpga designs. In Proceedings of the 2022 IEEE 31st International Symposium on Industrial Electronics (ISIE), Anchorage, AK, USA, 1–3 June 2022; pp. 649–654. [Google Scholar]
Lei, C.; Yuan, W. Review on technology of placement and routing for the FPGA. Acta Electron. Sin. 2022, 50, 1243–1254. [Google Scholar]

Figure 1. Dual-layer structure of SRAM-based FPGA.

Figure 2. Structure of the paper.

Figure 3. Experimental setup in [17].

Figure 4. SEU sensitivity calculation model.

Figure 5. Relationships among configuration data, non-essential bits, essential bits, and critical bits.

Figure 6. LUT—configuration bits mapping.

Figure 7. Bitstream decoding process.

Figure 8. External scrubber and internal scrubber.

Figure 9. Number of critical bits in each frame of the misex3 test circuit [43].

Figure 10. Distribution of non-critical bits, critical bits, primary critical bits, and secondary critical bits across different configuration frames.

Figure 11. Classification of configuration frames in [71].

Figure 12. FPGA application-layer division [73].

Figure 13. Memory scrubbing of BRAM [82].

Table 1. Summary of SRAM-based FPGA sensitivity analysis techniques.

Analysis Techniques			Advantages	Disadvantages
Analysis of application layer resources	Fault emulation-based	Using radiation tests	Emulate space radiation environments to the greatest extent.	High costs, time-consuming, potentially destructive to devices, and uncontrollable error locations.
	Fault emulation-based	Using hardware emulation	High repeatability and minimal destructive impact on circuits.	Complex, specialized hardware is required to perform error emulation tasks.
	Analytical model-based		Has no damage to the circuit under analysis.	Long analysis time and low accuracy.
Analysis of configuration bitstream	Based on dual-layer mapping		A certain research foundation.	Limited research on newer FPGA models.
Analysis of configuration bitstream	Using fault injection		The principle is simple and easy to operate.	Performing a comprehensive error injection on a large-scale FPGA takes too much time.

Table 2. Summary of nonlinear configuration scrubbing methods.

Research	Description
[43]	Shifted scrubbing. Derive the optimal scrubbing starting frame address based on the number of critical bits in each frame.
[44]	Scatter scrubbing. Derive the optimal starting frame address and adopt a jumping read-back verification method.
[70]	Combine essential-frame scrubbing and full-device scrubbing.
[33]	Further subdivide critical bits into PCB and SCB.
[71]	Categorize configuration frames into unused frames, basic frames, priority basic frames, and critical frames.
[72]	Distinguish highly critical bits within the critical bits.

Table 3. Summary of multi-task scrubbing scheduling techniques.

Research	Scheduling Methods	Advantages	Disadvantages
[73]	Static scheduling method based on task criticality and execution time.	Implemented scrubbing scheduling in multi-circuit task scenarios.	Adopts a static scheduling algorithm, unable to adapt to dynamic hardware task lists.
[74]	Dynamic scrubbing scheduling method based on time windows.	Requires less storage space and can adapt to dynamic hardware task lists.	Reliability decreases in scenarios with frequent parallel circuit tasks. Discriminates against low-criticality tasks.
[75]	Heterogeneous application scrubbing scheduling method based on time windows.	Considers DSP-related and high-data-throughput application tasks.	Allocates very low scrubbing frequencies for tasks of low criticality.
[76]	Dynamic scheduling based on DVFS.	Reduces conflicts among scrubbing tasks, further enhancing reliability.	Allocates very low scrubbing frequencies for tasks of low criticality.
[77]	Negotiation-driven scrubbing scheduling.	Better manages conflicts between scrubbing tasks.	Requires substantial storage space when there are a large number of tasks.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, W.; Li, X.; Chen, L.; Sun, H.; Zhang, F. A Review on Soft Error Correcting Techniques of Aerospace-Grade Static RAM-Based Field-Programmable Gate Arrays. Sensors 2024, 24, 5356. https://doi.org/10.3390/s24165356

AMA Style

Wang W, Li X, Chen L, Sun H, Zhang F. A Review on Soft Error Correcting Techniques of Aerospace-Grade Static RAM-Based Field-Programmable Gate Arrays. Sensors. 2024; 24(16):5356. https://doi.org/10.3390/s24165356

Chicago/Turabian Style

Wang, Weihang, Xuewu Li, Lei Chen, Huabo Sun, and Fan Zhang. 2024. "A Review on Soft Error Correcting Techniques of Aerospace-Grade Static RAM-Based Field-Programmable Gate Arrays" Sensors 24, no. 16: 5356. https://doi.org/10.3390/s24165356

APA Style

Wang, W., Li, X., Chen, L., Sun, H., & Zhang, F. (2024). A Review on Soft Error Correcting Techniques of Aerospace-Grade Static RAM-Based Field-Programmable Gate Arrays. Sensors, 24(16), 5356. https://doi.org/10.3390/s24165356

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Review on Soft Error Correcting Techniques of Aerospace-Grade Static RAM-Based Field-Programmable Gate Arrays

Abstract

1. Introduction

2. SRAM-Based FPGA Soft Error Sensitivity Analysis Techniques

2.1. Sensitivity Analysis of Application Layer Resources in SRAM-Based FPGAs

2.1.1. Fault Emulation-Based Soft Error Sensitivity Analysis

2.1.2. Analytical Model-Based Soft Error Sensitivity Analysis

2.2. Sensitivity Analysis of Configuration Bitstreams in SRAM-Based FPGAs

2.2.1. Configuration Bits Classification

2.2.2. Methods for Locating Essential and Critical Bits

3. SRAM-Based FPGA Configuration Scrubbing

3.1. Basic Configuration Scrubbing Techniques

3.1.1. External and Internal Scrubbing

3.1.2. Periodic Scrubbing and Corrective Scrubbing

3.1.3. Device-Level Scrubbing, Frame-Based Scrubbing, and Mixed Granularity Scrubbing

3.1.4. Redundant Configuration Scrubbing

3.2. Advanced Configuration Scrubbing Techniques

3.2.1. Nonlinear Configuration Scrubbing

3.2.2. Multitasking Scrub Scheduling

4. Memory Scrubbing Techniques

5. Challenges and Future Directions

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI