*2.2. Prediction Algorithm*

The SVM algorithm has been developed rapidly through the efforts of many scholars since it was proposed in 1963. It is based on slack variables [16] and VC dimension [17], and has strong sparsity and generalization capabilities. The ultimate goal of linear SVM is to find the optimal hyperplane *Pk* : *ωTX* + *b* = 0 that can divide the sample space correctly. However, the actual classification problem is often a non-linear problem, which requires a certain non-linear transformation to achieve spatial upscaling, so that a hyperplane that can be correctly classified in the high-dimensional feature space reappears. The kernel function can reduce the complexity of high-dimensional inner product operations. The SVM algorithm often uses Radial Basis Function as a kernel function in practical applications.

The SVM decision model only supports two classifications. The One-against-One (OAO) method [18] groups the training data according to the output categories, and builds a separate two-class SVM model between each two categories. A total of *K*(*K* − 1)/2 decision boundaries are obtained, and *K* is the total number of categories. When facing the new data to be classified, the OAO method inputs it into *K*(*K* − 1)/2 models to obtain the corresponding classification results, respectively. Finally, according to the voting strategy, the category with the most votes among all the results is counted as the final classification results.

The Decision Tree (DT) classifier can perform fast and effective classification in the face of a large amount of data input. However, weak generalization will cause serious overfitting in the sample space where the total number of categories is unbalanced. At the same time, the instability of the DT classifier will cause it to be very sensitive to high-frequency jitter in the flash memory training data, and generate very different DT models.

The implementation method, the K-Nearest Neighbor (KNN) classifier, is simple, and for the simplest KNN classifier, it does not even need to be trained. But if the training set sample points are not clipped, the method needs to store all sample points and calculate the distance from the sample points to be classified to all the sample points. The storage space and calculation resources required are very huge, which is not suitable for circuit realization.

#### **3. Endurance Prediction Method**

#### *3.1. The Process of Endurance Prediction Method*

Figure 1 shows the process of the SVM-based endurance prediction method; the SVMbased NAND Flash endurance prediction method includes two phases: Training phase and testing phase. The purpose of the training phase is to obtain sample data sets and use machine learning algorithms to establish a decision model suitable for Flash endurance prediction. The testing phase uses the established decision model to predict endurance.

**Figure 1.** The process of SVM-based endurance prediction method.

#### 3.1.1. Training Phase

Training phase includes data set extraction and model training. Data set extraction performs repeated P-E cycles on memory blocks of different flash memory particles with the same model to obtain flash memory endurance-related data in a certain rule. Model training uses the acquired flash memory endurance-related data set to train the machine learning model to obtain the decision function.

(a) Sample Selection

Select a certain number of flash memory blocks of the same model with suitable locations as samples. In order to avoid over-fitting, the number of flash memory blocks selected in each flash memory particle needs to be consistent.

(b) Parameter Setting

Set flash memory specification information, including interface protocol type, storage unit type, block size, page size, and the total number of blocks in a single logical unit. At the same time, set the test information, such as the used programming pattern, test mode, bad block characteristic error rate, etc. The random programming pattern can simulate various programming levels and combinations, fitting the programming pressure in practical applications to the greatest extent, which sets it as the default programming pattern in the proposed prediction method.

(c) P-E Cycle

*T<sup>α</sup>* P-E cycles are performed on flash memory particles to be tested to accelerate the wear of the memory unit. The number of P-E cycles *Nc* will be increased by one each time a cycle is completed. When performing P-E cycles on flash memory particles to accelerate memory cell wear, it is necessary to keep the idle period interval between programming and erasing operations in each cycle fixed to eliminate the difference in actual endurance caused by different idle period intervals.

(d) P-R-E Cycle and Data Sampling

*T<sup>ε</sup>* Program-Read-Erase (P-R-E) cycles are performed on the flash memory particles to be tested to obtain the data required for modeling. Update the current cycle number *Nc* = *N <sup>c</sup>* + *Tε*, and *N <sup>c</sup>* is the cycle number before the update. Multiple P-R-E cycle operations in a short period of time help reduce the negative effects of transient errors. Each

P-R-E cycle compares the read data with the written original data to obtain the RBE numbers of each flash page. At the same time, the current cycle number *Nc* and the duration of the flash operation are recorded during each P-R-E cycle as the original model training data set.

Repeat steps c and d until it is detected that the RBE numbers of a page in the block exceeds the ECC error correction. In order to ensure that there are enough samples in each endurance stage, continue to perform *Te* PE cycles and then stop sampling.

#### 3.1.2. Testing Phase

The main purpose of the actual application process of the model is to extract the parameters of the trained model and implement it with a specific circuit, and then face the new data in the actual use scene, call the prediction model circuit, and get the prediction result.
