2. The Integrity of the Flash Memory Block

There is also a significant integrity between different pages in the same flash memory block, such as the "cliff" phenomenon. As shown in Figure 2b, even if the wear degree and RBE numbers of different pages are different, the RBE numbers of all pages in the flash memory block jumps at the same time at the end of the life and surges to more than 60,000. Considering that the page capacity of the selected flash is 16 KB and the pattern used is pseudo-random, page RBE numbers as high as half of the page capacity means that the block has lost its storage function. The structural traps generated by the cell wear form fine "cracks" [21], which is very common in many types of flash memory particles. When the "cracks" accumulate to a certain extent, the insulating layer is broken down, forming a penetration path, which causes a large area of memory cells to fail. Pages with large differences in physical characteristics show the same end-of-life endurance performance, which makes it difficult to predict the endurance of the flash memory page as an independent object.

#### 3. Array Coupling and Bad Block Management

On the one hand, the word line wear in a multi-bit memory cell will be reflected in the RBE number of multiple pages, which means that coupling relationship between different pages affects each other. On the other hand, the difference and randomness of the written original data will lead to the difference in the degree of influence of the array interference phenomenon on different pages. The difference is very significant locally, which seriously affects the accuracy of the prediction.

#### 3.2.2. Input Features

The selection and processing of input features determine whether the prediction algorithm can achieve good results in practical applications.

1. Number of P-E Cycles

From the perspective of application scenarios, the number of remaining P-E cycles is a direct indicator of the endurance of the flash memory. Therefore, the number of P-E cycles is the most direct feature of endurance level prediction.

The total endurance of the actual flash memory block is affected by the programming pressure. Factors such as temperature, programming pattern, and idle period time interval will cause differences in programming pressure, resulting in differences in the total endurance of the flash memory block. Process variation can also cause a huge difference in endurance between flash memory particles. Even with the same batch of flash memory particles of the same process, it is impossible to guarantee that all the particles have the same constitution, which results in a large degree of dispersion of the total P-E cycle number between flash memory particles. Thus, the P-E cycle number cannot be used as a single feature for endurance prediction.

2. Raw Bit Errors

RBE measures the degree of unit wear from the perspective of bad block judgment standards. With the increase in the number of P-E cycles, the RBE numbers of each page of the flash memory particle has increased to varying degrees. The dominant reason for the change of RBE numbers is that the interface traps caused by cell wear cause charge escape/combination and cause the threshold voltage to shift.

#### 3. Erasing Duration

Both programming and erasing operations involve the charge tunneling effect. The interface traps caused by the effect will change the electrical parameters of the memory cell, affecting the tunneling efficiency, which indirectly leads to changes in the operating time. The flash programming strategy causes the programming duration to change with the decrease in endurance, but the programming duration as an input feature is not ideal in actual application scenarios due to great differences in different types of pages in the multi-bit memory cell structure.

The erase operation applies a positive pulse on the substrate to initialize all data in the block to an erased state, and also uses threshold voltage verification to determine whether to apply an additional pulse. However, contrary to the programming operation, the interface traps hinder the tunneling of the charge from the storage layer to the substrate and cause the number of erase pulses to increase, which in turn increases the erase duration.

#### 3.2.3. Label

The endurance judgment is related to the ECC error correction algorithm. When the RBE numbers exceeds the upper limit of the algorithm error correction, the endurance will return to zero. In addition, the garbage collection and out-of-place update mechanisms in the SSD controller [22] lead to amplification effects. Therefore, the number of P-E cycles available at the SSD level is much less than the number of P-E cycles available at the flash block level.

According to the endurance criterion described above, the number of P-E cycles is the metric, and the RBE numbers is the criterion. The endurance level prediction model provides a basis for the endurance level evaluation of the SSD wear leveling algorithm. In addition, the endurance level prediction model can be used to warn and mark the flash memory blocks that will become bad blocks. The number of remaining P-E cycles and RBE numbers are both competent for endurance level prediction.

#### *3.3. Optimization Strategy*

#### 3.3.1. RBE Preprocessing

Certain inducing factors will increase the RBE numbers of the partial page in the block. In addition, the arithmetic average can weaken the endurance difference caused by the increase of the RBE numbers of the partial page, reflecting the overall level of endurance of the flash memory block. The erase operation acts on all pages in the flash memory block. If the RBE numbers of a page exceeds the upper limit of ECC error correction, the block will be marked as a bad block. Therefore, the maximum value of page RBE is an effective endurance level predictive input feature.

The standard deviation of the RBE numbers between pages can effectively reflect the difference in endurance between pages. The difference in endurance between pages increases as the endurance decreases, which means that the standard deviation of the RBE numbers can reflect changes in endurance. At the same time, when some non-local disturbances cause overall changes in the RBE numbers of pages within a block, the arithmetic mean will be greatly affected. However, the standard deviation describing the degree of difference can well shield these integrity negative effects of disturbances.

Figure 3 shows a statistical graph of the maximum value of the RBE numbers and standard deviation of a certain block of pages as the number of P-E cycles increases. After ignoring the jitter, the figure shows a monotonous upward trend with the number of P-E cycles, which provides significant rules for machine learning algorithms to learn.

**Figure 3.** (**a**) The relationship between the maximum RBE numbers of flash memory page and P-E cycle number; (**b**) The relationship between page standard deviation and P-E cycle number.

#### 3.3.2. Transient Error

The transient error caused by the RTN phenomenon has uncertainty: the uncertain drain current causes the threshold voltage to shift randomly, resulting in an uncertain change in the page RBE numbers. Therefore, the page RBE numbers obtained by the test jitters violently as the number of P-E cycles *Nc* increases, causing significant noise. Since the page RBE numbers is small at the initial stage of wear, and the amount of page RBE change caused by the increase in *Nc* is not significant, this kind of jitter noise has a great negative impact on endurance level prediction accuracy.

Repeating the P-R-E cycle for a predefined duration and taking the average value of the page RBE can effectively reduce the page RBE numbers noise caused by the RTN phenomenon. However, continuous read operation cannot be used instead of continuous P-R-E operation. Performing a continuous read operation after a single programming operation can ensure that the page RBE numbers obtained each time is under the same *Nc*. However, in multiple consecutive reading operations, the reading disturbance makes each reading operation affect the result of subsequent reading, and the multiple reading operations are not independent. After the erase operation, the memory cell is approximately restored to the same state, so each P-R-E cycle can be considered independent and does not affect each other. In addition, the RTN phenomenon mainly occurs during the programming operation and for a period of time after it. The RTN phenomenon does not significantly disturb the threshold voltage during multiple continuous read operations. Therefore, the continuous read operation after programming does not improve the negative impact of the RTN phenomenon. Because the maximum value operation is a nonlinear transformation, the sequence of page RBE numbers preprocessing and transient error optimization strategy will have a potential impact on prediction accuracy.

#### **4. Experiments and Analysis**

#### *4.1. Test Platform*

## 4.1.1. Test Platform Architecture

A scheme is realized by the Xilinx ZYNQ-7000 series xc7z030ffg676-2 SoC chip (hereafter referred to as ZYNQ-7030) to build a NAND Flash test platform. The test platform consists of a host computer and multiple test boards, as shown in Figure 4. A graphical user interface (GUI) test program runs in the host computer, and multiple test boards are controlled by USB transmission. There are eight test sockets on the test board, and eight BGA132/152 packaged flash memory particles can be tested in parallel at the same time.

The test machine is responsible for flash memory specification setting, test process setting, and data storage. Each test board has a ZYNQ-7030 chip, which exchanges data with eight test flash memory particles through GPIO. ZYNQ-7030 can be divided into Processing System (PS) running firmware and Programming Logic (PL) based on Kintex-7 FPGA. The core of the PS is dual-core Cortex-A9, which is mainly responsible for flash memory particle initialization, test flow control, programming pattern drawing, etc. The firmware will automatically generate a test process loop according to the test process parameters transmitted by the host computer, and control the PL-side Flash interface protocol controller module to complete the corresponding command operations through the AXI bus register.

The PL is mainly responsible for functions such as flash interface control, test data processing, and endurance class prediction, including Flash interface protocol controller module, input acquisition and preprocessing module, machine learning prediction algorithm module, SPI protocol control module, etc. There are eight independent Flash interface protocol controller modules, each of which controls the flash memory particles of the channel. The controller module interacts with the PS firmware through the AXI bus register, and the information obtained during the test will pass through the input acquisition and preprocessing module, and then be transmitted to the PS. The SPI protocol control module controls the external 16 bit resolution ADC chip, which can obtain the current value of the flash memory particles at any time and calculate the instantaneous power consumption based on this.

### 4.1.2. Test Platform Cost

The test platform can test 64 × 8 flash memory particles at the same time. For a single flash memory particle with a block size of 1024, 50,000 P-E cycles only need 1213 min, and 5 repeated PRE cycles for 1000 blocks only need 261 min. The former corresponds to the data acquisition of the model building process, and the latter corresponds to the data acquisition of the actual application process.

In terms of predictive circuit modules, the use of PL-side FPGA can greatly shorten the time-consuming prediction of endurance levels. After testing, a single prediction of the prediction module under a 100 Mhz clock requires only about 37 us, while a single prediction implemented by PS-side embedded programming requires 108 us.

In terms of resource consumption, the PL-side FPGA hardware resource occupancy is shown in Table 1. The test platform and its proportion are calculated in the case of a 32-bit wide arithmetic unit. In order to achieve highly parallel testing, a single ZYNQ-7030 chip has eight channels internally instantiated, and each channel is divided into four parallel modules to realize multi-CE embedded testing. Each way, the parallel module supports three kinds of interface protocols. In the prediction circuit module, the CORDIC calculation module occupies a higher number of LUTs and Registers resources, which occupy 3925 and 3904, respectively. When the bit width of the calculation unit is reduced to 16 bit, the related resource consumption of the CORDIC calculation module drops significantly to 1159 and 1148.


**Table 1.** PL Hardware Resource Cost.
