1. Introduction
A rapidly changing technological industry has caused the market to rapidly incorporate new materials and parts. This has caused product obsolescence to occur in every production line in the industry owing to the availability of products that achieve better performance or are more cost-effective or both. Strategies for addressing obsolescence are related to the expenses of firms and customer satisfaction. For the obsolescence management, reactive strategies such as lifetime buy, last-time buy, or identification of alternative parts are only temporary and may cause additional delays compared to the proactive strategies. If the probability of obsolescence and the cost associated with the obsolescence are high, it is recommended that one apply proactive management strategies to minimize the risk of obsolescence and associated costs. In fact, forecasting the occurrence of obsolescence is the key factor in proactive management, and many researchers have focused on the development of methods based on the prediction of obsolescence. Proactive strategies allow firms to prepare for the event of obsolescence; manufacturing losses can be reduced by predicting the life cycle of various components, including electronic components [
1,
2,
3].
In this study, we aim to predict the cycle of diminishing manufacturing sources and materials shortages (DMSMS) obsolescence, which is defined as the loss of the ability to procure a technology or part from its original manufacturer. It is necessary to accurately predict the obsolescence cycle to reduce the risk for manufacturers and various companies caused by problems such as fast technology processes and short technology life cycles. Various statistical models for the accurate prediction of the obsolescence risk and date have been studied [
4,
5,
6,
7]. A Weibull-based conditional probability method as a risk-based approach to predicting microelectronic component obsolescence is described in [
6]. The references to the problem of component obsolescence are summarized in [
8]. However, it is difficult to implement a rapidly adapting statistical model to predict the obsolescence cycle of thousands of different types of components. Moreover, it is difficult to gather the input parameters of different models.
With recent improvements in computer performance, many methods for predicting future trends by learning large-capacity data and collecting necessary information are being studied. These learning methods, particularly machine-learning or deep-learning methods, are demonstrating outstanding results in various fields [
9,
10,
11,
12]. Depending on the data type or application, various machine-learning methods can be used. To the best of the authors’ knowledge, there are few studies in which these machine-learning or deep-learning methods have been applied to predict the cycle of DMSMS obsolescence. Jennings et al. (2016) [
13] proposed two machine learning-based methods for predicting the obsolescence risk and life cycle. Good prediction results were reported by using random forest, artificial neural networks, and support vector machines for cell phone market data. Grichi et al. (2017, 2018) [
14,
15] proposed the use of a random forest and a random forest together with genetic algorithm searches for optimal parameter and feature selection for cell phone data, respectively. Trabelsi et al. (2021) [
16] combined a feature selection and machine learning for obsolescence prediction. As described above, ordinary learning methods attempted to increase the accuracy of prediction by combining the existing machine-learning methods and applying them to the component obsolescence data. Although it is necessary to present efficient methods and hybridize them, it is expected that the accuracy of prediction can be improved further if the characteristics of each part data are used for learning. Therefore, in this study, the clustering method, which first classifies and learns data according to characteristics, is newly applied to predict the obsolescence of components.
The objective of this paper is as follows: Does machine learning improve the proactive strategy and prediction of obsolescence? Can it be effective and reliable? The obsolescence of the parts of diodes is predicted in this study when a sufficient amount of the data is not provided; the lack of available data for obsolescence problems is a crucial weakness in ordinary machine- or deep-learning methods. We propose a very accurate, fast, and reliable machine-learning method, which overcomes this weakness by using an unsupervised clustering algorithm and an ensemble of supervised regression techniques. Supervised regression tries to identify the parameters of the model from the labelled data and unsupervised clustering partitions the entire data into a few groups of similar data based on outward appearance. It is expected that the parameters obtained from a cluster of similar data fit machine-learning models better than the parameters from the entire set because the entire set has more variation and randomness. Thus, instead of constructing a single model for the entire set, several models are constructed, each of which is independently trained with the data in one cluster only, and the conjecture is experimentally validated by using several real datasets. It is the novelty of the study to apply an unsupervised clustering algorithm to supervised regression to improve model training. The usage of a hybrid ensemble method including several reliable regression techniques additionally improves the prediction accuracy; this is another novelty of the study. It is confirmed by using various measures that the prediction accuracy of the obsolescence date is improved through the proposed clustering-based hybrid method for diode data from three categories such as Zener diodes, varactors, and bridge rectifier diodes. The proposed clustering-based hybrid method can be easily extended not only to electrical component data but also to other types of obsolescence cycle prediction problems.
The rest of the paper is organized as follows.
Section 2 describes the machine-learning and deep-learning algorithms used in the experiments. The proposed hybrid method based on
k-means clustering is explained in
Section 3. The statistics of the data and the descriptions of the hyperparameters are presented in
Section 4. The accuracy measures and experimental results are presented and discussed in
Section 5. The conclusions are drawn in
Section 6.
5. Results and Discussion
To compare the performance of different methods, the accuracy is measured by the mean relative error (MRE):
and the root mean squared relative error (RMSRE):
where
is the actual value and
is the predicted value.
N is the number of predictions.
If a machine-learning method is not applied, statistical methods can be applied for the prediction of the obsolescence date. For the expected value of the obsolescence date, the sample mean of the observed, i.e., known obsolescence dates from the training data can be used as a prediction value, which will be referred to as “Statistic” below. That is, Statistic is defined by (
8)
where
is the number of the training data, which can be used as a naive prediction value for the test data.
We first determine whether learning with clustering produces any improvement over learning without clustering.
Figure 9 shows the distribution of the relative error of the prediction
for the Zener diode data when DT and the naive statistic are applied.
Figure 9a shows the distribution without clustering and
Figure 9b with clustering, respectively. The deviations from DT are smaller and the corresponding predictions are closer to the actual values than the naive approach. It should be noted that the predicted values from DT with clustering are closer to the actual values than those without clustering. Clustering is observed to reduce the variation and improve the prediction accuracy.
Figure 10 shows the distributions of
in (
9) by using the hybrid method (a) without clustering and (b) with clustering. Similarly to
Figure 9, the range of the distributed values from the hybrid method is narrower than that from the naive statistic, and the result from the hybrid method with clustering is superior to the result from the hybrid method without clustering. Similar trends are observed for other machine-learning methods or other datasets as well (not shown), and it is empirically supported that clustering leads to improvement.
Next, we determine the machine-learning method that produces the best prediction result.
Figure 11 shows the distributions of the deviation of the prediction from various machine-learning methods, DT, RF, GB, DNN, RNN, and hybrid, when clustering is applied to Zener diodes. Four machine-learning methods, DT, RF, GB, and hybrid, result in similar prediction distributions, whereas the results from two deep-learning methods, DNN and RNN, are slightly worse than those from the machine-learning methods.
Figure 12 and
Figure 13 show the results of the varactors and bridge rectifier diodes, and similar trends are observed. One of the reasons for the poor results from the deep-learning methods may be result from insufficient data. In fact, deep learning is superior to ordinary shallow machine learning if the number of data is large enough. However, the data for the current case study are insufficient and the ordinary shallow machine-learning produces better results than the deep learning in this study.
Subsequently, we compare the prediction accuracy with respect to two measures, MRE and RMSRE.
Table 8 presents the MRE errors of the training data with and without clustering. It shows that the errors from the naive statistic prediction and two deep-learning methods, the DNN and RNN methods are larger than those of the other shallow machine-learning methods and that training with GB overfits the given training data.
Table 9 lists the MRE error of the test data with and without clustering. The predictions from all the machine-learning or deep-learning methods with or without clustering are better than the naive statistic prediction and the four shallow machine-learning methods, DT, RF, GB, and hybrid methods produce better results than DNN and RNN for for all the three categories. Deep learning methods produce good regression accuracies in many applications, but they have difficulty in finding right parameters in this study owing to the lack of data.
Although the prediction of Statistic from clustering is improved over the prediction without clustering, the results from the machine learning still dominate. When clustering is applied, the errors from the four shallow learning methods are smaller than those from deep-learning methods. Among shallow machine-learning methods, the DT, GB, and hybrid methods give good predictions for the Zener diodes and bridge rectifier diodes, whereas the DT, RF, and hybrid methods give good predictions for the varactors. Because the data in each cluster from the
k-means algorithm has less variation than the entire data, the machine-learning model trained with the clusters represents the data better than a single model trained with the entire data and thus the accuracies of the models with clustering are better than those without clustering even when the same model is applied. It should be noted that the hybrid method produces good accuracy regardless of the category or the training method, which implies that the hybrid method is reliable.
Figure 14a presents the MRE of the test data with and without clustering for Zener diodes, which shows that model training with unsupervised clustering algorithm improves the prediction accuracy and reduces the errors. Similar reduction in MRE is observed in the varactors as in
Figure 14b and bridge rectifier diodes as in
Figure 14c.
Table 10 lists the RMSRE errors of the training data with and without clustering. Similarly to
Table 8, the errors from the naive statistic, DNN, and RNN methods are larger than the others and training with GB seems to overfit.
Table 11 lists the RMSRE errors of the test data with and without clustering. The predictions from all the machine-learning methods without clustering are better than the naive statistic prediction for the Zener diodes and varactors. In case of the bridge rectifier diodes, the Statistic and RNN methods without clustering result in large errors. In fact, the RMSRE errors from RNN method are large for all the three categories. The RMSRE errors from the models with clustering are smaller than those without clustering as in
Table 12. The RMSRE errors from the deep-learning methods, DNN and RNN, with clustering are as small as those from the other methods for the varactors. Although the trends of the results from the RMSRE are quite similar to those from the MRE, the errors from the RMSRE are relatively larger than those from the MRE because some errors are large owing to an insufficient amount of data and the RMSRE is dependent more on such values than the MRE.
Figure 15 presents the RMSRE of the test data with and without clustering for the Zener diodes, varactors, and bridge rectifier diodes, respectively. The figure shows again that unsupervised clustering algorithm improves the prediction accuracy of the supervised regression models as observed in
Figure 14.
Table 13 lists the widths of the 95% confidence intervals of the predicted values using various methods. As shown in the
Table 13, the size of the confidence interval of the hybrid method with clustering is much smaller than that of the method without clustering. Therefore, it can be inferred that the estimate using the proposed method with clustering is more stable and accurate. As an example, for the bridge rectifier diodes data, the width of the confidence interval of the predicted value using an RNN is 24 times wider, and in the case of using an RF, the width is
times wider than that obtained by using the proposed hybrid method.
Figure 16 presents the widths of the 95% confidence interval using a bar graph, which shows the variation of the prediction accuracy of various machine-learning methods. The bar corresponding to the proposed hybrid method with clustering (red) is shorter than the others for all the three categories, which confirms the superiority of the proposed method.
6. Conclusions
This paper proposed an accurate and reliable method for the prediction of the obsolescence date of the components of the diodes based on the k-means method and a hybrid ensemble method. It is the novelty of the study to apply the unsupervised clustering method to the supervised regression problem to improve the prediction. The k-means unsupervised clustering algorithm partitioned the entire set into clusters of similar data. The proposed method trained with similar data in each cluster demonstrated better predictions than the single model trained with the entire set regardless of the category of the diodes even when a sufficient amount of data was not provided whereby ordinary shallow or deep-learning methods would face difficulties in realizing accurate forecasts. The hybrid method including several regression techniques made further improvements in prediction accuracy.
There are two research directions from the current proposed model. One is the combination of unsupervised clustering and deep-learning models with many hidden layers and sufficiently many data samples, which was not supported in the current study. It is expected that the accuracy of the deep-learning method will be improved when training is performed with similar data samples. The other direction is to improve the clustering method. Although the k-means algorithm is a good clustering method, there still exist areas for continued development such as sensitivity to initial values or hyperparameter tuning. Moreover, because unsupervised clustering method partitions the entire data into disjointed clusters, some samples near a boundary are assigned to clusters, which are not intuitively appropriate. If there can be a way to handle those data properly and assign them to appropriate clusters, the prediction will be improved even further.
The proposed method is applied to the obsolescence of electric diodes in this study, which can be applied to various fields from the obsolescence of other components to any regression problems in sciences such as financial market prediction.