**1. Introduction**

With the development of smart devices and cloud computing, flash memory has gained great popularity in various fields [1]. NAND flash memory has achieved larger storage capacity and higher storage speed than NOR flash memory by virtue of the design mode of storage units connected in series, becoming an important large-scale data storage medium. In order to pursue higher storage density, a variety of technologies have been developed in the field of NAND flash memory. Three-dimensional structure technology [2] is committed to transforming a planar structure into a three-dimensional structure, which increases the storage capacity under the same area. Multi-bit memory cell technology [3] focuses on improving the number of bits in the storage unit in order to achieve a multiple increase in storage capacity. With gradual in-depth study of the two technologies, researchers have found that while the storage density of NAND flash memory has doubled, the data reliability problem has worsened.

Data reliability marks the accuracy of data storage. If data errors occur during use, serious consequences will be immeasurable. In the field of NAND flash memory, data reliability problems are mainly reflected in retention [4] and endurance. The former reflects the data retention time without re-erasing, while the latter is the problem of reliability

**Citation:** Zhang, H.; Wang, J.; Chen, Z.; Pan, Y.; Lu, Z.; Liu, Z. An SVM-Based NAND Flash Endurance Prediction Method. *Micromachines* **2021**, *12*, 746. https://doi.org/ 10.3390/mi12070746

Academic Editors: Cristian Zambelli and Rino Micheloni

Received: 21 May 2021 Accepted: 21 June 2021 Published: 25 June 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

degradation caused by structural damage to the memory cell during use. Compared to retention, endurance has a greater impact on the actual product. As the number of programming increases, the endurance of the flash memory decreases and the number Raw Bit Error (RBE) gradually rises. The RBE number is the number of bits of difference between the actual read data and the actual programmed data without error correction, which is an important parameter to characterize the degree of endurance change. When the RBE numbers exceeds a certain limit, the flash memory will not continue to be used normally.

It has become an important direction for academia to suppress data errors caused by reduced endurance. The Error Correcting Code (ECC) error correction algorithm uses special coding rules to check and correct the original data [5], so the endurance of the flash memory can be indirectly improved by optimizing the error correction algorithm. However, limited by the storage space, the optimization effect is not ideal. Wear-Leveling technology [6] indirectly delays the occurrence of user data failure by balancing the number of programming and erasing of each block. However, the total endurance of the regions in the flash memory is not equal; there is room for further optimization of the technology. The Read-Retry technology reduces data errors by modifying the hard-decision reference voltage of the read operation, but this will increase the operating time and reduce the performance.

However, the real dilemma of flash memory reliability research lies in the uncertainty of endurance, which will lead to huge waste. The minimum actual endurance of flash memory is often dozens or even a hundred times the manufacturer's nominal value [7]. The reason why manufacturers specify the nominal value so conservatively is the huge difference in endurance between the same type of flash memory particles [8]. Even if the manufacturer inspects and screens the wafers before they leave the factory, there are still several times or even ten times the endurance difference between the same model and the same batch of flash memory particles. Besides, the particle-level inspection is destructive and extremely time-consuming. If the endurance can be accurately predicted and an early warning be made, the user can adjust the critical value of the data transfer process and greatly extend the service life of the flash memory.

Using machine learning algorithms to accurately predict changes in flash memory endurance-related parameters have become an important means to solve the flash memory reliability dilemma. It can greatly optimize the existing flash memory management strategies and implement accurate endurance warnings. Damien Hogan tried to combine a supervised Genetic Programming (GP) algorithm with the endurance prediction of 2D Multi-level Cell (MLC) flash memory to determine whether the sample flash memory with different levels of Program-Erase (P-E) cycles will generate uncorrectable data errors [9]. However, the GP two-category prediction model finally obtained in the study has a prediction accuracy of only 83.5% on the test set when the decision boundary is 35,000. Barry Fitzgerald observed through a large amount of experimental data that the word line (WL) number, page type, and page parity in MLC flash memory will affect the code word error rate (CWER), the programming, and erasing duration [10]. Using the feature, the study proposed a sampling method based on the error probability density function [11], and constructed eight different two-class machine learning models. However, the study neglected the class balance of the data set. The number of negative samples representing the number of codeword errors exceeding 100 only accounted for 0.03% of the total number of samples, which led to a significant decrease in the reliability of the model accuracy results. Ruixiang Ma considered that the predictive model may lose its validity due to changes in the flash memory usage environment, so the incremental changes in endurance parameters are used to update the predictive model to adapt to the parameter changes at different endurance stages [12]. However, this solution did not take into account the hardware complexity and application limitations of using the same flash memory pre-data to predict the later endurance.

On the one hand, the endurance prediction model established by existing studies performs two-class prediction of the RBE numbers, and the RBE numbers corresponding to the classification boundary is close to the upper limit of the ECC error correction algorithm, which limits the application scenarios of the prediction model. On the other hand, existing research does not consider the disturbance of electrical effects such as transient errors in NAND flash memory on the prediction results, which greatly reduces the prediction accuracy. In order to solve the endurance prediction problem, this paper designs a set of NAND flash memory endurance class prediction method based on SVM algorithm based on a large amount of experimental test data and combined with micro-mechanism analysis. The main contents of the paper are as follows:


#### **2. Flash Error Mechanism and Prediction Algorithm**

#### *2.1. Flash Error Mechanism*

The cause of the NAND flash memory endurance problem lies in its unique memory cell and array structure. The basic memory cell structure of flash memory is based on the development and evolution of the Floating Gate (FG) Field Effect Transistor (FET). The internal electrical disturbance phenomenon of the NAND flash memory is closely related to its array structure. The NAND flash memory will produce the decrease of reliability due to many physical effects in actual scenarios. There are several kinds of mechanisms.

#### 2.1.1. Unit Wear-Out

Unit cell wear is the direct cause of reduced endurance of flash memory. Wear-out and aging often occur in the process of charge tunneling and transfer during programming and erasing operations. Wear-out causes the atomic bonds at the interface between the charge trap layer and the insulating layer to break, resulting in interface traps that interfere with the charge transport process and cause the threshold voltage to deviate from the ideal value. Therefore, the interface traps are the main reason for the endurance of the flash memory cell to decrease and the occurrence of error bit flips. The wear-out and aging of the unit caused by the P-E cycle is small but irreversible, and the number of P-E cycles is correlated with the endurance.

## 2.1.2. Disturbance

Certain electrical effects caused by the special array structure of flash memory can cause threshold voltage shifts. The most common effect is the disturbance phenomenon [13]. Disturbance is not permanently structurally destructive. The memory cell is restored to its original state by erasing. And the severity of disturbance is closely related to the programming pattern. The specific programming pattern significantly stimulates some disturbances. Multiple read operations on the same memory cell before the erase operation will cause reads disturbance, which causes the threshold voltage of the affected memory cell to shift in the positive direction. When the shifted threshold voltage exceeds the harddecision reference voltage between different programming states, data errors will occur. Program disturbance and pass disturbance will occur during the programming operation. Edge word line disturbance also occurs during programming operations. The edge word

line unit generates a large number of electron-hole pairs due to the large gate-induced drain leakage [14]. The electrons are accelerated to the channel and injected into the storage layer in the edge word line unit, leading to a surge in threshold voltage and data errors.

#### 2.1.3. Transient Error

Transient error refers to the data error flipping caused by some uncertain transient factors during the operation of the flash memory. Because of the uncertainty of inducing factors, these transient errors are difficult to limit by conventional means. The most typical transient error is the uncertainty error caused by the Random Telegraph Noise (RTN) phenomenon. The error causes uncertain fluctuations in the drain current [15], which in turn causes the threshold voltage to fluctuate uncertainly.
