4.2.1. Experiment Method

The flash memory particles selected in the experiment are the same batch of MT29F25 6G08EBHAFJ4 (NW911) chips from Micron. The flash memory particle type of this model is 3D TLC, whose block size is 2304, and the page capacity is 18,588 bytes. In the experiment, the number of consecutive P-R-E cycles *T<sup>ε</sup>* = 5, *T<sup>α</sup>* + *T<sup>ε</sup>* = 50, a set of sample data can be obtained every 50 times of programming, and a total of 96 flash memory blocks are tested for each endurance stage. The programming pattern adopts a PS/PL mixed pseudorandom pattern.

The experiment uses a multi-classification model. There are four labels in the output result of this experiment, which are the maximum level of RBE on the page after 100, 200, and 500 P-E cycles and the level of the number of remaining P-E cycles, which are marked as labels 1–4 in order. As shown in Table 2, each label contains four categories:




The experiment uses SVM algorithm for model training by default, and the verification method uses five-fold cross-validation. We conducted four independent model trainings. Each training is based on the training data set of one label, and the training data of the remaining three labels is discarded. Since the total number of P-E cycles of the sample flash memory block is wide, and the maximum RBE value at the initial stage of life is generally higher than 300, it is very close to the category boundary 400 of category 1 and category 2. This phenomenon leads to an imbalance in the number of samples in different categories for each label, and the imbalance in the number of samples in different categories is not consistent on different labels. Therefore, before four independent model trainings, we balance the number of samples of different categories for each label, and reduce the number

of samples of the three larger categories to the smallest number of category samples by random selection—the total number of samples for label 1 was found to be 6576, label 2 was 8272, label 3 was 8952, and label 4 was 15,208.


The confusion matrix is a visual numerical matrix used to reflect the classification results of a supervised machine learning model. Various indicators of the classifier model are calculated based on the confusion matrix. The confusion matrix of the *L* classifier model is a square matrix of *L* × *L*, which can intuitively reflect the distribution of each actual category and output category. Each row of the confusion matrix belongs to the same actual category, and each column belongs to the same output category.

#### 2. Numerical Indicators

The two-class model commonly uses Accuracy (A), Precision (P), Recall (R), and F1-score (*F*1) to measure the pros and cons of the model. However, in the multi-class model, the increase in the number of rows and columns of the confusion matrix leads to ambiguous indicator definitions, so corresponding changes are needed. The expression and meaning of the numerical indicators of the classification model are shown in Table 3. Among them, the accuracy and recall rate in the multi-classification model are divided into three categories: macro, micro, and weighted, and the Kappa coefficient is introduced.


**Table 3.** The expression and meaning of the numerical indicators of the classification model.

#### 3. ROC Curve

The ROC curve is often used for model comparison and threshold screening in the case of classification. The AUC value of the area under the curve can intuitively reflect the pros and cons of performance. The larger the AUC value, the better the performance. In a multi-class model, each category corresponds to a ROC curve, and it is necessary to ensure that the categories are the same when comparing models.

In summary, when comparing the prediction results, we will compare the accuracy rate *A*, macro accuracy *macro*−*P*, macro recall rate *macro*−*R*, *macro-F*1 score, Kappa coefficient *K*, and roc curve.

#### *4.3. Analysis*

#### 4.3.1. Comparison of the Results of Different Labels

This experiment uses Binary Relevance technology to transform the multi-label multiclassification model into L single-label multi-classification models. Each label is trained separately to obtain the prediction result, and the results of different labels are compared. The numerical indicator results are shown in Table 4.


**Table 4.** Statistical table of model numerical indicators of different labels.

The numerical indicators of label 1 are the best. The first four indicators all reach 95.8%, the Kappa coefficient is about 94.4%, and the *K* value greater than 90% means that the model has extremely high consistency. The effect of the numerical indicators of label 2 is followed closely. The first four indicators are about 95.2–95.3%; the gap is not big. The prediction effect of the model of label 4 is the worst. The first four indicators are about 89.7%, and the *K* value is only 86.4%.

Figure 5a visually compares the correct rate A and Kappa coefficient *K* of the models of different labels through the bar graph. The difference between labels 1 to 3 is not big, label 1 is the best, and label 4 is obviously different. The indicator gap between the four labels is related to the classification basis. Labels 1 to 3 are divided into categories based on the RBE numbers, which essentially predicts the changes of certain parameters after a certain number of times in the future. At the same time, the difference between tags 1 to 3 is that the value of *Ni* is different. In addition, the value of *Ni* reflects the number of times the predicted target is away from the current state. Tag 1 has the smallest gap and tag 3 has the largest. The smaller the gap means the smaller the change based on the current state, and the higher the accuracy of the prediction will naturally be. Tag 4 is divided into categories based on the number of P-E cycles. Essentially, it judges the current endurance stage based on the characterization of the current endurance parameters, and it also needs to determine the total endurance range. The large difference in endurance and mischaracterization between the flash memory particles greatly weakens the model's ability to judge.

**Figure 5.** (**a**) Comparison of model accuracy and Kappa coefficient of different labels; (**b**) Comparison of the accuracy of models in other studies.

The comparison between the models corresponding to the four labels and the endurance prediction models of other researchers is shown in Figure 5b. Barry's scheme worked best, achieving a 99.4% correct rate. However, the number of negative samples in the study only accounted for 0.03%, which greatly reduced its reliability. The correct rates of labels 1–3 and Lin's models in this scheme are about 94–96%, followed by label 4. These models are ahead of the 83.5% correct rate of Damien's scheme. Excluding the unreliable Barry scheme due to extremely unbalanced samples, the model accuracy rate of this scheme is in the first echelon in this field. Compared with the two-class judgment of other schemes, this scheme adopts a multi-classification model, and the increase in the

number of categories makes it more abundant in application scenarios. In addition to basic bad block warning, the model of this solution can also be used for wear leveling strategies, factory screening and rating, etc.

Since the AUC value of category 4 of each model is about 0.99, the upper left corner area is enlarged to the lower right corner for display. As shown in Figure 6a, the AUC value relationship of tags 1–3 is consistent with the correct rate relationship, that is, *AUC*<sup>1</sup> > *AUC*<sup>2</sup> > *AUC*3.

**Figure 6.** (**a**) ROC curves of models with different labels; (**b**) ROC curves of different algorithms; (**c**) ROC curves of models with or without transient error optimization; (**d**) ROC curves of models with different verification method.

The ROC curve of category 4 of labels 1 to 3 reflects the model's prediction of bad blocks, because the boundary of category 4 is close to the critical value of bad block judgment. This shows that the pros and cons of the bad blocks predicted by tags 1–3 are also consistent with the overall pros and cons of the model. However, the special case is that the ROC curves of label 4 and label 3 are very close, but there is a gap between the two in numerical indicators. There are two main reasons for this phenomenon: first, the classification dimensions of label 3 and label 4 are not consistent, and the meaning of category 4 is not the same. It is meaningless to directly compare the ROC curves of the two categories 4; second, the selected numerical indicators. It reflects the overall situation of the four categories, and there are differences between the local and the whole. In fact, the correct rate A of each category is calculated separately, and the correct rates A of label 3 and label 4 to category 4 are 96.27% and 96.26%, respectively, which are very close. However, the correct rate A of label 3 for categories 2 and 3 is 97.64% and 95.26%, while the correct rate A of label 4 for categories 2 and 3 is only 93.55% and 92.18%, which is a large gap.

Considering the evaluation indicators and actual application scenarios, this paper believes that the prediction model of label 3 is the best because the numerical indicator of label 4 is low. When the difference between the internal numerical indicators of labels 1 to 3 is not large, the value of *Ni* of label 3 is larger, which means that label 3 can make early warning and decision-making in actual application scenarios. Therefore, when comparing other variables in the follow-up, they will all be discussed in the case of label 3.

#### 4.3.2. Comparison of Results of Different Algorithms

In addition to the default SVM algorithm, we also use the DT algorithm and the KNN algorithm to perform model training on the same training set. The results are shown in the Table 5. Figure 7 shows Accuracy and Kappa coefficient of different algorithms. The SVM algorithm has achieved the best results in all the numerical indicators in the table, which is about 3% to 4% higher than the DT algorithm, and about 5% to 8% higher than the KNN algorithm. At the same time, this experiment has conducted sample balance processing between each category, considering the KNN algorithm's classification disadvantages of unbalanced sample data sets; in actual situations, when the endurance level prediction scheme is applied, the KNN algorithm may be more disadvantageous in accuracy.


**Table 5.** Statistical table of model numerical indicators of different algorithms.

**Figure 7.** Comparison of model accuracy and Kappa coefficient of different algorithms.

It can be seen from the ROC curve in Figure 6b that the SVM algorithm is still the best in performance of the AUC value, while the DT algorithm is the worst, and the gap is obvious. Because the ROC curve for category 4 reflects the model's classification performance in the critical value of bad block judgment, and is an important indicator of the pros and cons of the bad block early warning function, the DT algorithm has a great disadvantage in this important function.

#### 4.3.3. Analysis of Transient Error Optimization Effect

#### 1. Comparison of Optimized and No Optimization

Comparing the effect of optimization with or without transient errors will inevitably lead to an imbalance in the number of category samples in one of the cases. Therefore, the weighted-P indicator is added to the statistical table of numerical indicators of the model with or without optimization, as shown in Table 6.

**Table 6.** Statistical table of numerical indicators of optimization and no optimization.


The input of the non-optimized model takes the last time of the *Tα* cycle, and the input of the optimized model uses the first average processing method. From a comparison of Table 6, it can be seen that there is a huge gap in the numerical indicators of the model with or without transient error optimization. The accuracy rate A of the optimized model is 7.9% higher than that of the non-optimized model. Among the other five indicators, the optimized model is about 7% to 10% higher than the non-optimized model, which is a significant improvement. This is because the transient error optimization strategy can significantly reduce the jitter noise in the endurance data, so that the machine learning

algorithm can better analyze the intrinsic relationship between the endurance level and the input vector.

The ROC curve in Figure 6c shows that the AUC value of the optimized model is higher than that of the non-optimized model, so the optimized model judges bad blocks more accurately. The comparison result fully illustrates the necessity and correctness of the transient error optimization strategy.

#### 2. Comparison of Optimization Order

The sequence problem of the maximum/standard deviation operation and the transient error optimization operation will cause the difference of the input vector after the transient error optimization, which is essentially caused by the characteristics of the nonlinear transformation. The previous forecasting models are all optimized before processing. Table 7 shows the numerical indicator results of earlier transient optimization and later transient optimization.

**Table 7.** Statistical table of numerical indicators of optimization order.


It can be seen from the table that pre-optimization is better in the prediction results. The accuracy rate A has achieved a lead of 2.4%, and the other indicators have achieved a lead of 2% to 3%. The result is related to the theoretical basis of the optimization strategy. The transient error optimization strategy is based on the theoretical situation that probability *p* = *ψ*(*Nc*) can be considered as a fixed value, when *Nc* is approximately constant. However, the function *ϕ*(*Nc*) of different storage units is different, owing to which the theoretical situation applies only to the same storage unit or page. Post-optimization will cause the optimization strategy to deviate from the theoretical situation. At the same time, in the early stage of endurance when the page RBE numbers change little, the gap between the RBE numbers pages is little. The errors caused by various disturbance factors account for a relatively large amount. The post-optimization will greatly weaken the effect of transient error optimization. The pre-optimization can ensure that the input vector of *f*(*Sk*) comes from the same flash page, so that the theoretical situation is applicable and no negative effects will occur. Therefore, the pre-optimization method can achieve better optimization results.

4.3.4. Analysis of Validation Method and Feature Correlation

1. Comparison of Different Validation Methods

The performance of classification model results is evaluated and compared in a fivefold cross-validation method. The advantage of this method is to reduce the statistical uncertainty of the average test error estimation, so as to facilitate model comparison and result analysis. In order to avoid misjudgments caused by differences in validation methods, we also compared the prediction results of the five-fold cross-validation and different ratios of Hold-Out validation methods. The Hold-Out ratios are selected as 20%, 25%, and 30%.

According to the comparison results of the indicators in Table 8, the best indicator is the Hold-Out method with a ratio of 20%, and the worst is the Hold-Out method with a ratio of 30%. The accuracy difference between the two is about 1.7%. The Kappa coefficient gap is about 2.3%. In fact, the prediction result of the Hold-Out method changes greatly due to the difference in the selection of the test set. Taking the 20% ratio Hold-Out method as an example, the accuracy rates A of the five repetitive training models with the same data set are 94.52%, 92.99%, 93.83%, 93.60%, and 94.66%, and the difference between the maximum and minimum values is about 1.67%. Taking into account the indicator fluctuations caused by the difference in the selection of the test set, it can be considered that the numerical indicators of the models in the four cases are very close.


**Table 8.** Statistical table of numerical indicators of different validation methods.

The ROC curve in Figure 6d also confirms this conclusion. The ROC curves of the four models are very close. Therefore, the Hold-Out method with a separate test set can still obtain almost the same evaluation index, indicating that the model obtained by the endurance level prediction scheme can still achieve excellent prediction results in the additional test set.

#### 2. Feature Correlation

At present, the features of the experiment are the arithmetic mean, maximum, and standard deviation of the page RBE numbers, as well as the number of P-E cycles and the duration of erasure. When performing feature analysis, the Pearson correlation coefficient *r* can be used to measure the linear correlation between the various dimensions of the input vector. When its value is close to 1, it means that the redundancy of the input vector space is large, and the dimension of the input vector can be simplified. Through calculation, the Pearson correlation coefficient *r*(*RBEa*, *RBEs*) between the arithmetic mean of the page RBE numbers and the standard deviation is:

$$r(RBE\_a, RBE\_s) = \frac{\sum\_{i} \left(X\_i - \overline{X}\right) \left(Y\_i - \overline{Y}\right)}{\sqrt{\sum\_{i} \left(X\_i - \overline{X}\right)^2} \sqrt{\sum\_{i} \left(Y\_i - \overline{Y}\right)^2}} = 0.9790\tag{1}$$

The value is extremely close to 1, which means that there is a strong linear correlation between the two vectors. When the two are used as the model input vector dimensions at the same time, the recognition of the feature relationship between the input and the output is of minimal help. At the same time, the prediction circuit needs to perform parallel calculations on various dimensions of input. If the input dimensions can be reduced, the hardware resource consumption will be greatly reduced.

As shown in Table 9, the PCA dimensionality reduction method reduces the input from five dimensions to four. Through comparison, it can be found that the complete input still achieves the best prediction effect, but the lead is extremely small. Excluding the two cases of RBE arithmetic mean/standard deviation, the difference is too small to be ignored. Considering that the arithmetic mean can shield the local disturbance, and the standard deviation can shield the overall disturbance, the linear correlation between the two vectors is extremely strong, indicating that the impact of the endurance change on the overall disturbance and the local disturbance is positively correlated.

**Table 9.** Statistical table of numerical indicators of input dimension reduction.


Figure 8 shows that the arithmetic mean and standard deviation are roughly linear distributions, which are consistent with the results. The model indicator using the PCA dimensionality reduction method is very close to the complete input model indicator, but this method requires dimensionality reduction through a certain function transformation, which will add additional hardware resources. The method of removing the arithmetic mean/standard deviation of the page RBE reduces the consumption of hardware resources on the premise that the difference between the arithmetic mean/standard deviation and the complete input is negligible. Therefore, the input of the standard deviation will be removed in the implementation of the specific scheme.

**Figure 8.** The distribution of the arithmetic mean and standard deviation of the sample points of the data set.

#### *4.4. Application*

The endurance level prediction model has many practical application scenarios. The paper designs a simple warning strategy for bad blocks based on the prediction scheme introduced. In the actual application scenario of bad block warning, the prediction model will face the problem of recall rate. Assuming that a block will become a bad block after a certain number of programming times, the recall rate determines the probability that the prediction model can be used to successfully judge and give an early warning. The recall rate is the most important evaluation indicator in the early warning of bad blocks. In data-sensitive fields, users stop using the flash memory when the usage of flash memory reaches half of the nominal value because the method can make the recall rate reach 100%. Even if the method will cause the real usage rate of flash memory to be much lower than 10%, it is necessary to ensure that no bad blocks are missed. Therefore, the prediction model applied to the bad block early warning strategy needs to achieve the following goals:


Based on the above objectives, this paper designs a comprehensive strategy for early warning of bad blocks. Take Figure 9 as an example—the curve in the figure represents a schematic curve of the error rate of the flash memory block as a function of the number of P-E cycles. Assuming that the flash memory reaches the critical value of bad blocks at point N, an uncorrectable data error occurs. The number of P-E cycles between point M and point N differs by 500. Point P is the first time that the predictive model judges the block to be a positive type (assuming that Bad block) moment. Before point P, the bad block warning strategy wakes up the endurance level prediction circuit during every A programming operation. After the P point, it is changed to wake up once every B programming operation, B < A. In this way, the prediction circuit can be called with a lower wake-up frequency during the endurance stage with lower risk, and frequent predictions when approaching the end of the endurance period, in order to take into account resource consumption and the accuracy of the early warning strategy. In the judgment of bad blocks, the category 4 of the prediction model labels 1 to 3 or category 1 of the label 4 are regarded as the positive type, because they both mean that the prediction result of the flash memory block is located at point M and to the right. The prediction circuit is frequently awakened after point P. If C consecutive prediction results are positive, an early warning is sent to the controller.

**Figure 9.** Schematic diagram of the flash block error rate and the number of P-E cycles.

Let A be 200 and B be 50, and use the prediction model of label 3 to test. After testing 96 sample blocks, the program successfully provided early warning for 93 blocks, with a success rate of about 96.9%. The recall rate of the model category 4 is only 89.90%, which shows that the bad block early warning strategy can successfully improve the accuracy of the endurance class prediction model in practical applications under the condition of low wake-up rate.

#### **5. Conclusions**

In order to effectively prolong the service life of flash memory and avoid the loss caused by sudden failure, this paper conducts related research on flash memory endurance, proposes a flash memory endurance grade prediction scheme based on the SVM algorithm, and designs a high parallel test platform and low time-consuming endurance prediction module based on FPGA. We research and analyze the feature quantities closely related to the endurance changes in the flash memory, and determine that the model takes the block as the object. The page RBE numbers, the number of P-E cycles, and the erase duration in the block are used as the input feature quantity, and the output is the remaining lifetime level, or RBE numbers level after 100/200/500 P-E cycles. This scheme adopts a variety of strategies to reduce the negative interference in the forecasting process in a targeted manner. The prediction module is realized based on the ZYNQ-7030 chip. The SVM decision model is deconstructed and the parallel multiplication structure is designed to realize the highly multiplexed pipelined calculation. The prediction module only needs 37 us per time, which greatly reduces the time consumption of prediction.

The method uses multi-category evaluation indicators to analyze five aspects: four tags achieved 89.77–95.82% accuracy, each evaluation indicator is in the leading echelon, and the increase in the number of categories expands the scope of application. Compared with DT and KNN, the SVM model of the RBF kernel function achieved a lead of 3–8%. The model using the transient error optimization strategy achieved an indicator increase of 7– 10%, and pre-optimization leads up to 2% to 3%. Cross-validation and Hold-Out validation results show that the model can still achieve the same prediction effect in the additional test set. Pearson correlation coefficient analysis shows that the impact of the endurance change on the overall disturbance and the local disturbance is positively correlated. Finally, the bad block early warning strategy designed based on the proposed model can successfully achieve early warning for 96.9% of the blocks.

**Author Contributions:** The work presented here was completed in collaboration between all authors. H.Z. prepared the manuscript, designed the test platform, and performed some of the experiments. J.W. developed the test platform and also performed some of the experiments. Z.C. developed the test platform and also performed some of the experiments. Y.P. performed some of the experiments as well. Z.L. (Zhaojun Lu) performed some of the experiments and revised the manuscript. Z.L. (Zhenglin Liu) proposed the ideas and revised the manuscript. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was supported in part by the National Natural Science Foundation of China (Grant No. 61874047) and a grant of key technologies R&D general program of Shenzhen, No. 20201102 3000308.

**Conflicts of Interest:** The authors declare that there is no conflict of interest.
