Handling Imbalanced Datasets for Robust Deep Neural Network-Based Fault Detection in Manufacturing Systems
Abstract
:1. Introduction
- We propose a classifier-level method called DLWP loss, which can be applied directly to DNNs, preserving the original dataset distribution during training while improving overall classifier performance on the imbalanced dataset.
- We introduce logit weight vectors, a set of tunable network hyperparameters adjustable on a case-by-case basis, meant for regulating the levels of focus accorded to each of the distinct target classes during training.
- We show that information from the imbalanced target class distribution can be strategically used to generate a suitable logit weight vector that predisposes a DNN to focus on minority samples during critical periods of training.
- We introduce a training regime that facilitates switching between predefined logit weight vectors, one for each training phase of a DNN, achieving improved classifier performance on the minority samples while still effectively generalizing the entire dataset.
- We propose a data-informed strategy for safety-related FD systems first by generating models that prioritize recall, followed by a call to action enlisting human experts for feedback on a subset of the results; model predictions with high uncertainty requiring further action.
2. Review
3. Materials and Methods
3.1. Preliminaries
3.2. Temperature-Scaled Softmax for DNNs
3.3. Logit Perturbation
3.4. Class Rebalanced Noise Logit Perturbation for DNNs
3.5. Switching Logit Weights for Improved Classifier Generalization
3.6. Weight Selection Strategies
- Empirical Risk Minimization (ERM): minimizes the empirical expectation of losses obtained by applying a prescribed loss function over some labeled training set. All training samples from all the classes are weighted equally.
- Inverse Class Reweighting [23]: class balanced-loss for each training sample is obtained by reweighting the prescribed loss function by the inverse class frequency for its class.
- Effective Number Reweighting [23]: class balanced-loss for each training sample is obtained by reweighting the prescribed loss function by the inverse of the effective number of samples for its class.
- Delayed Effective Number Reweighting [47]: the class balanced-loss is obtained by applying the standard ERM until the last learning rate decay when the inverse effective number of sample-based reweighting is applied.
- Effective Number Probability (ENPr): inverse of the effective number of samples per class is converted to logit weight class probabilities and applied to the DLWP loss method as the ideal logit weights probability distribution.
- Delayed Effective Number Probability (DENPr): standard ERM until the last learning rate decay when the inverse of the effective number of samples per class is converted to class probabilities and applied to the DLWP loss method as the ideal logit weights probability distribution.
- Relative Likelihood Probability (RLPr): relative likelihood method is used to generate the class probabilities applied to the DLWP loss method as the ideal logit weights probability distribution.
3.7. Application and Implementation Details
3.7.1. APS Failure at Scania Trucks Dataset
3.7.2. Steel Plates Faults Dataset
3.8. Uncertainty Estimation
- Entropy: for a given set of C point estimates from a model prediction, we use the Entropy [81] method, which computes the entropy of the prediction as a score. The higher the entropy score, the more uncertain the model’s prediction.
- Jain’s Fairness Index: for a given set of C point estimates from a model prediction, we use the Jain’s fairness index [85] or score defined where . The result ranges from 1/C representing the lowest to 1 as the highest fairness score. The higher the fairness score, the more uncertain the model’s prediction.
3.9. Evaluation Techniques
3.10. Datasets
- Steel Plates Faults dataset consists of a total of 1941 instances meant for the classification of surface defects in stainless steel plates. The dataset instances are grouped into 7 distinct typologies of faults. Each recorded instance consists of 27 attributes representing the geometric shape of the fault and its contour. The target class distribution reveals an imbalanced dataset.
- APS Failure at Scania Trucks dataset is an imbalanced dataset consisting of a total of 76,000 instances meant for the prediction of failures in the Air Pressure System (APS) of Scania Trucks. The dataset instances are instances divided into 60,000 training set instances (59,000 negative, 1000 positive) and 16,000 test set instances (15,625 negative, 375 positive). The dataset consists of 171 attributes per recorded instance, where all attribute names have been anonymized for proprietary reasons.
4. Results and Discussion
4.1. Further Experiments and Results
4.1.1. APS Failure at Scania Trucks Results
4.1.2. Steel Plates Faults Results
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Appendix A
PRECISION | PRC-AUC | RECALL | ROC-AUC | F1 | CM | Total | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Method | Macro↑ | 0↑ | 1↑ | 0↑ | 1↑ | Macro↑ | 0↑ | 1↑ | 0↑ | 1↑ | Macro↑ | 0↑ | 1↑ | FP↓ | FN↓ | Cost↓ |
CE-None | 0.79 | 1.0 | 0.58 | 1.0 | 0.54 | 0.95 | 0.98 | 0.92 | 0.95 | 0.95 | 0.85 | 0.99 | 0.71 | 246 | 31 | 17,960 |
CE-RW | 0.75 | 1.0 | 0.50 | 1.0 | 0.47 | 0.96 | 0.98 | 0.95 | 0.96 | 0.96 | 0.82 | 0.99 | 0.65 | 361 | 18 | 12,610 |
CE-DRW | 0.79 | 1.0 | 0.58 | 1.0 | 0.54 | 0.95 | 0.98 | 0.92 | 0.95 | 0.95 | 0.85 | 0.99 | 0.71 | 246 | 31 | 17,960 |
CE-RLPr | 0.72 | 1.0 | 0.44 | 1.0 | 0.42 | 0.96 | 0.97 | 0.95 | 0.96 | 0.96 | 0.80 | 0.98 | 0.61 | 450 | 17 | 13,000 |
Focal-None | 0.74 | 1.0 | 0.49 | 1.0 | 0.45 | 0.95 | 0.98 | 0.93 | 0.95 | 0.95 | 0.81 | 0.99 | 0.64 | 365 | 28 | 17,650 |
Focal-RW | 0.73 | 1.0 | 0.45 | 1.0 | 0.44 | 0.97 | 0.97 | 0.96 | 0.97 | 0.97 | 0.80 | 0.99 | 0.62 | 432 | 15 | 11,820 |
Focal-DRW | 0.74 | 1.0 | 0.49 | 1.0 | 0.45 | 0.95 | 0.98 | 0.93 | 0.95 | 0.95 | 0.81 | 0.99 | 0.64 | 365 | 28 | 17,650 |
Focal-RLPr | 0.67 | 1.0 | 0.35 | 1.0 | 0.34 | 0.96 | 0.96 | 0.97 | 0.96 | 0.96 | 0.74 | 0.98 | 0.51 | 684 | 11 | 12,340 |
LDAM-None | 0.70 | 1.0 | 0.40 | 1.0 | 0.38 | 0.96 | 0.97 | 0.95 | 0.96 | 0.96 | 0.77 | 0.98 | 0.57 | 527 | 19 | 14,770 |
LDAM-RW | 0.68 | 1.0 | 0.36 | 1.0 | 0.35 | 0.97 | 0.96 | 0.97 | 0.97 | 0.97 | 0.75 | 0.98 | 0.52 | 655 | 10 | 11,550 |
LDAM-DRW | 0.70 | 1.0 | 0.40 | 1.0 | 0.38 | 0.96 | 0.97 | 0.95 | 0.96 | 0.96 | 0.77 | 0.98 | 0.57 | 527 | 19 | 14,470 |
LDAM-RLPr | 0.75 | 1.0 | 0.49 | 1.0 | 0.47 | 0.97 | 0.98 | 0.96 | 0.97 | 0.97 | 0.82 | 0.99 | 0.65 | 368 | 16 | 11,680 |
DLWP-None-None | 0.83 | 1.0 | 0.67 | 1.0 | 0.57 | 0.92 | 0.99 | 0.86 | 0.92 | 0.92 | 0.87 | 0.99 | 0.75 | 161 | 54 | 28,610 |
DLWP-None-ENPr | 0.84 | 1.0 | 0.69 | 1.0 | 0.61 | 0.93 | 0.99 | 0.88 | 0.93 | 0.93 | 0.88 | 0.99 | 0.77 | 147 | 46 | 24,470 |
DLWP-None-DENPr | 0.84 | 1.0 | 0.67 | 1.0 | 0.60 | 0.94 | 0.99 | 0.89 | 0.94 | 0.94 | 0.88 | 0.99 | 0.77 | 160 | 43 | 23,100 |
DLWP-None-RLPr | 0.75 | 1.0 | 0.40 | 1.0 | 0.47 | 0.96 | 0.98 | 0.95 | 0.96 | 0.96 | 0.82 | 0.99 | 0.65 | 365 | 18 | 12,650 |
DLWP-RW-None | 0.70 | 1.0 | 0.39 | 1.0 | 0.38 | 0.96 | 0.96 | 0.97 | 0.96 | 0.96 | 0.77 | 0.98 | 0.56 | 561 | 13 | 12,110 |
DLWP-RW-ENPr | 0.73 | 1.0 | 0.47 | 1.0 | 0.44 | 0.96 | 0.97 | 0.94 | 0.96 | 0.96 | 0.80 | 0.99 | 0.62 | 407 | 21 | 14,570 |
DLWP-RW-DENPr | 0.70 | 1.0 | 0.40 | 1.0 | 0.30 | 0.96 | 0.97 | 0.96 | 0.96 | 0.96 | 0.77 | 0.98 | 0.56 | 545 | 15 | 12,950 |
DLWP-RW-RLPr | 0.76 | 1.0 | 0.52 | 1.0 | 0.49 | 0.96 | 0.98 | 0.94 | 0.96 | 0.96 | 0.83 | 0.99 | 0.67 | 329 | 21 | 13,790 |
DLWP-DRW-None | 0.83 | 1.0 | 0.67 | 1.0 | 0.57 | 0.92 | 0.99 | 0.86 | 0.92 | 0.92 | 0.87 | 0.99 | 0.75 | 161 | 54 | 28,610 |
DLWP-DRW-ENPr | 0.84 | 1.0 | 0.69 | 1.0 | 0.61 | 0.93 | 0.99 | 0.88 | 0.93 | 0.93 | 0.88 | 0.99 | 0.77 | 147 | 46 | 24,470 |
DLWP-DRW-DENPr | 0.70 | 1.0 | 0.40 | 1.0 | 0.38 | 0.96 | 0.97 | 0.96 | 0.96 | 0.96 | 0.77 | 0.98 | 0.56 | 545 | 15 | 12,950 |
DLWP-DRW-RLPr | 0.75 | 1.0 | 0.49 | 1.0 | 0.47 | 0.96 | 0.98 | 0.95 | 0.96 | 0.96 | 0.82 | 0.99 | 0.65 | 365 | 18 | 12,650 |
DLWP-RLPr-None | 0.67 | 1.0 | 0.35 | 1.0 | 0.34 | 0.97 | 0.96 | 0.98 | 0.97 | 0.97 | 0.75 | 0.98 | 0.51 | 690 | 6 | 9900 |
DLWP-RLPr-ENPr | 0.70 | 1.0 | 0.40 | 1.0 | 0.39 | 0.97 | 0.96 | 0.98 | 0.97 | 0.97 | 0.77 | 0.98 | 0.56 | 558 | 9 | 10,080 |
DLWP-RLPr-DENPr | 0.73 | 1.0 | 0.47 | 1.0 | 0.45 | 0.97 | 0.97 | 0.96 | 0.97 | 0.97 | 0.81 | 0.99 | 0.63 | 404 | 16 | 12,040 |
DLWP-RLPr-RLPr | 0.70 | 1.0 | 0.39 | 1.0 | 0.38 | 0.97 | 0.96 | 0.98 | 0.97 | 0.97 | 0.77 | 0.98 | 0.56 | 569 | 9 | 10,190 |
PRECISION | PRC-AUC | RECALL | ROC-AUC | F1 | CM | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Method | Macro↑ | 3↑ | 4↑ | 3↑ | 4↑ | Macro↑ | 3↑ | 4↑ | 3↑ | 4↑ | Macro↑ | 3↑ | 4↑ | 3↑ | 4↑ |
CE-None | 0.71 | 0.78 | 0.74 | 0.74 | 0.58 | 0.79 | 0.95 | 0.77 | 0.97 | 0.88 | 0.74 | 0.86 | 0.76 | 21 | 17 |
CE-RW | 0.68 | 0.68 | 0.66 | 0.65 | 0.57 | 0.80 | 0.95 | 0.86 | 0.97 | 0.92 | 0.71 | 0.79 | 0.75 | 21 | 19 |
CE-DRW | 0.72 | 0.84 | 0.60 | 0.80 | 0.50 | 0.81 | 0.95 | 0.82 | 0.97 | 0.90 | 0.75 | 0.89 | 0.69 | 21 | 18 |
CE-RLPr | 0.70 | 0.72 | 0.66 | 0.69 | 0.57 | 0.81 | 0.95 | 0.86 | 0.97 | 0.92 | 0.73 | 0.82 | 0.75 | 21 | 19 |
Focal-None | 0.72 | 0.75 | 0.74 | 0.72 | 0.58 | 0.79 | 0.95 | 0.77 | 0.97 | 0.88 | 0.74 | 0.84 | 0.76 | 21 | 17 |
Focal-RW | 0.69 | 0.72 | 0.73 | 0.69 | 0.64 | 0.81 | 0.95 | 0.86 | 0.97 | 0.93 | 0.72 | 0.82 | 0.79 | 21 | 19 |
Focal-DRW | 0.69 | 0.78 | 0.59 | 0.74 | 0.52 | 0.81 | 0.95 | 0.86 | 0.97 | 0.92 | 0.73 | 0.86 | 0.70 | 21 | 19 |
Focal-RLPr | 0.68 | 0.75 | 0.53 | 0.72 | 0.48 | 0.78 | 0.95 | 0.91 | 0.97 | 0.94 | 0.72 | 0.84 | 0.67 | 21 | 20 |
LDAM-None | 0.63 | 0.86 | 0.0 | 0.75 | 0.40 | 0.68 | 0.86 | 0.0 | 0.93 | 0.50 | 0.65 | 0.86 | 0.0 | 19 | 0 |
LDAM-RW | 0.68 | 0.70 | 0.66 | 0.67 | 0.57 | 0.80 | 0.95 | 0.86 | 0.97 | 0.92 | 0.71 | 0.81 | 0.75 | 21 | 19 |
LDAM-DRW | 0.69 | 0.77 | 0.54 | 0.70 | 0.47 | 0.79 | 0.91 | 0.86 | 0.95 | 0.92 | 0.73 | 0.83 | 0.67 | 20 | 19 |
LDAM-RLPr | 0.63 | 0.64 | 0.48 | 0.61 | 0.44 | 0.79 | 0.95 | 0.91 | 0.97 | 0.93 | 0.67 | 0.76 | 0.62 | 21 | 20 |
DLWP-None-None | 0.75 | 0.84 | 0.86 | 0.80 | 0.71 | 0.80 | 0.95 | 0.82 | 0.97 | 0.91 | 0.77 | 0.89 | 0.84 | 21 | 18 |
DLWP-None-ENPr | 0.77 | 0.83 | 0.95 | 0.76 | 0.83 | 0.80 | 0.91 | 0.86 | 0.95 | 0.93 | 0.78 | 0.87 | 0.90 | 20 | 19 |
DLWP-None-DENPr | 0.79 | 0.91 | 0.82 | 0.83 | 0.68 | 0.79 | 0.91 | 0.82 | 0.95 | 0.91 | 0.79 | 0.91 | 0.82 | 20 | 18 |
DLWP-None-RLPr | 0.79 | 0.91 | 0.85 | 0.83 | 0.67 | 0.79 | 0.91 | 0.77 | 0.95 | 0.88 | 0.79 | 0.91 | 0.81 | 20 | 17 |
DLWP-RW-None | 0.70 | 0.72 | 0.73 | 0.69 | 0.64 | 0.81 | 0.95 | 0.86 | 0.97 | 0.93 | 0.73 | 0.82 | 0.79 | 21 | 19 |
DLWP-RW-ENPr | 0.70 | 0.72 | 0.59 | 0.69 | 0.52 | 0.81 | 0.95 | 0.86 | 0.97 | 0.92 | 0.73 | 0.82 | 0.70 | 21 | 19 |
DLWP-RW-DENPr | 0.71 | 0.72 | 0.74 | 0.69 | 0.68 | 0.81 | 0.95 | 0.91 | 0.97 | 0.95 | 0.75 | 0.82 | 0.82 | 21 | 20 |
DLWP-RW-RLPr | 0.69 | 0.81 | 0.53 | 0.77 | 0.46 | 0.81 | 0.95 | 0.86 | 0.97 | 0.92 | 0.73 | 0.88 | 0.66 | 21 | 19 |
DLWP-DRW-None | 0.70 | 0.75 | 0.68 | 0.72 | 0.59 | 0.81 | 0.95 | 0.86 | 0.97 | 0.92 | 0.74 | 0.84 | 0.76 | 21 | 19 |
DLWP-DRW-ENPr | 0.77 | 0.83 | 0.95 | 0.76 | 0.83 | 0.80 | 0.91 | 0.86 | 0.95 | 0.93 | 0.78 | 0.87 | 0.90 | 20 | 19 |
DLWP-DRW-DENPr | 0.70 | 0.81 | 0.61 | 0.77 | 0.55 | 0.81 | 0.95 | 0.91 | 0.97 | 0.94 | 0.73 | 0.88 | 0.73 | 21 | 20 |
DLWP-DRW-RLPr | 0.71 | 0.78 | 0.59 | 0.74 | 0.52 | 0.82 | 0.95 | 0.86 | 0.97 | 0.92 | 0.74 | 0.86 | 0.70 | 21 | 19 |
DLWP-RLPr-None | 0.70 | 0.72 | 0.66 | 0.69 | 0.57 | 0.80 | 0.95 | 0.86 | 0.97 | 0.92 | 0.73 | 0.75 | 0.82 | 21 | 19 |
DLWP-RLPr-ENPr | 0.70 | 0.75 | 0.62 | 0.72 | 0.51 | 0.81 | 0.95 | 0.82 | 0.97 | 0.90 | 0.73 | 0.84 | 0.71 | 21 | 18 |
DLWP-RLPr-DENPr | 0.67 | 0.72 | 0.63 | 0.69 | 0.55 | 0.80 | 0.95 | 0.86 | 0.97 | 0.92 | 0.71 | 0.82 | 0.73 | 21 | 19 |
DLWP-RLPr-RLPr | 0.68 | 0.70 | 0.43 | 0.67 | 0.41 | 0.82 | 0.95 | 0.95 | 0.97 | 0.95 | 0.72 | 0.81 | 0.59 | 21 | 21 |
References
- Thoben, K.D.; Wiesner, S.; Wuest, T. “Industrie 4.0” and Smart Manufacturin—A Review of Research Issues and Application Examples. Int. J. Autom. Technol. 2017, 11, 4–19. [Google Scholar] [CrossRef] [Green Version]
- O’Donovan, P.; Bruton, K.; O’Sullivan, D. Case Study: The Implementation of a Data-Driven Industrial Analytics Methodology and Platform for Smart Manufacturing. Int. J. Prognost. Health Manag. 2016, 7, 1–22. [Google Scholar]
- Davis, J.; Edgar, T.; Graybill, R.; Korambath, P.; Schott, B.; Swink, D.; Wang, J.; Wetzel, J. Smart Manufacturing. Annu. Rev. Chem. Biomol. Eng. 2015, 6, 141–160. [Google Scholar] [CrossRef] [Green Version]
- Koomey, J.G.; Scott Matthews, H.; Williams, E. Smart Everything: Will Intelligent Systems Reduce Resource Use? Annu. Rev. Environ. Resour. 2013, 38, 311–343. [Google Scholar] [CrossRef]
- Tilbury, D.M. Cyber-Physical Manufacturing Systems. Annu. Rev. Control Robot. Auton. Syst. 2019, 2, 427–443. [Google Scholar] [CrossRef]
- Chiang, L.; Lu, B.; Castillo, I. Big Data Analytics in Chemical Engineering. Annu. Rev. Chem. Biomol. Eng. 2017, 8, 63–85. [Google Scholar] [CrossRef] [PubMed]
- Lau, C.K.; Ghosh, K.; Hussain, M.A.; Che Hassan, C.R. Fault diagnosis of Tennessee Eastman process with multi-scale PCA and ANFIS. Chemom. Intell. Lab. Syst. 2013, 120, 1–14. [Google Scholar] [CrossRef]
- Fathy, Y.; Jaber, M.; Brintrup, A. Learning With Imbalanced Data in Smart Manufacturing: A Comparative Analysis. IEEE Access 2021, 9, 2734–2757. [Google Scholar] [CrossRef]
- Venkatasubramanian, V.; Rengaswamy, R.; Yin, K.; Kavuri, S.N. A review of process fault detection and diagnosis part I: Quantitative model-based methods. Comput. Chem. Eng. 2003, 27, 293–311. [Google Scholar] [CrossRef]
- Venkatasubramanian, V.; Rengaswamy, R.; Kavuri, S.N. A review of process fault detection and diagnosis part II: Qualitative models and search strategies. Comput. Chem. Eng. 2003, 27, 313–326. [Google Scholar] [CrossRef]
- Venkatasubramanian, V.; Rengaswamy, R.; Yin, K.; Kavuri, S.N. A review of fault detection and diagnosis. Part III: Process history based methods. Comput. Chem. Eng. 2003, 27, 327–346. [Google Scholar] [CrossRef]
- Sánchez-Fernández, A.; Baldán, F.J.; Sainz-Palmero, G.I.; Benítez, J.M.; Fuente, M.J. Fault detection based on time series modeling and multivariate statistical process control. Chemom. Intell. Lab. Syst. 2018, 182, 57–69. [Google Scholar] [CrossRef]
- Knight, J.C. Safety Critical Systems: Challenges and Directions. In Proceedings of the 24th International Conference on Software Engineering; Association for Computing Machinery: New York, NY, USA, 2002; pp. 547–550. [Google Scholar]
- Park, Y.J.; Fan, S.K.S.; Hsu, C.Y. A review on fault detection and process diagnostics in industrial processes. Processes 2020, 8, 1123. [Google Scholar] [CrossRef]
- Buda, M.; Maki, A.; Mazurowski, M.A. A systematic study of the class imbalance problem in convolutional neural networks. Neural Netw. 2018, 106, 249–259. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Jaitly, N.; Senior, A.; Vanhoucke, V.; Nguyen, P.; Sainath, T.; Kingsbury, B.; Hinton, G.; Deng, L.; Yu, D.; Dahl, G.; et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition. IEEE Signal Process. Mag. 2012, 2, 1–27. [Google Scholar]
- Ba, J.L.; Kiros, J.R.; Hinton, G.E. Layer Normalization. In Proceedings of the Conference on Neural Information Processing Systems (NeurIPS), Barcelona, Spain, 5–10 December 2016. [Google Scholar]
- Xiao, B.; Wu, H.; Wei, Y. Simple baselines for human pose estimation and tracking. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 472–487. [Google Scholar]
- Girshick, R.B. Fast R-CNN. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Washington, DC, USA, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
- Wuest, T.; Weimer, D.; Irgens, C.; Thoben, K.D. Machine learning in manufacturing: Advantages, challenges, and applications. Prod. Manuf. Res. 2016, 4, 23–45. [Google Scholar] [CrossRef] [Green Version]
- Wang, Y.X.; Ramanan, D.; Hebert, M. Learning to model the tail. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 7030–7040. [Google Scholar]
- Zhu, X.; Vondrick, C.; Fowlkes, C.C.; Ramanan, D. Do We Need More Training Data? Int. J. Comput. Vis. 2016, 119, 76–92. [Google Scholar] [CrossRef] [Green Version]
- Cui, Y.; Jia, M.; Lin, T.Y.; Song, Y.; Belongie, S. Class-Balanced Loss Based on Effective Number of Samples. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019. [Google Scholar]
- Adam, A.; Chew, L.C.; Shapiai, M.I.; Jau, L.W.; Ibrahim, Z.; Khalid, M. A Hybrid Artificial Neural Network-Naive Bayes for solving imbalanced dataset problems in semiconductor manufacturing test process. In Proceedings of the 2011 11th International Conference on Hybrid Intelligent Systems (HIS), Malacca, Malaysia, 5–8 December 2011; pp. 133–138. [Google Scholar]
- Saqlain, M.; Abbas, Q.; Lee, J.Y. A Deep Convolutional Neural Network for Wafer Defect Identification on an Imbalanced Dataset in Semiconductor Manufacturing Processes. IEEE Trans. Semicond. Manuf. 2020, 33, 436–444. [Google Scholar] [CrossRef]
- Zhou, X.; Hu, Y.; Liang, W.; Ma, J.; Jin, Q. Variational LSTM Enhanced Anomaly Detection for Industrial Big Data. IEEE Trans. Ind. Inform. 2021, 17, 3469–3477. [Google Scholar] [CrossRef]
- Lee, J.; Lee, Y.C.; Kim, J.T. Fault detection based on one-class deep learning for manufacturing applications limited to an imbalanced database. J. Manuf. Syst. 2020, 57, 357–366. [Google Scholar] [CrossRef]
- McAllister, R.; Gal, Y.; Kendall, A.; van der Wilk, M.; Shah, A.; Cipolla, R.; Weller, A. Concrete Problems for Autonomous Vehicle Safety: Advantages of Bayesian Deep Learning. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI-17, Melbourne, Australia, 19–25 August 2017; pp. 4745–4753. [Google Scholar]
- Jamal, M.A.; Brown, M.; Yang, M.H.; Wang, L.; Gong, B. Rethinking Class-Balanced Methods for Long-Tailed Visual Recognition From a Domain Adaptation Perspective. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16–18 June 2020; pp. 7607–7616. [Google Scholar]
- Liu, Z.; Miao, Z.; Zhan, X.; Wang, J.; Gong, B.; Yu, S. Large-Scale Long-Tailed Recognition in an Open World. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019; pp. 2532–2541. [Google Scholar]
- Ando, S.; Huang, C.Y. Deep Over-Sampling Framework for Classifying Imbalanced Data. ECML/PKDD. 2017. Available online: http://ecmlpkdd2017.ijs.si/papers/paperID24.pdf (accessed on 1 March 2021).
- Liu, J. Fault diagnosis using contribution plots without smearing effect on non-faulty variables. J. Process Control 2012, 22, 1609–1623. [Google Scholar] [CrossRef]
- Guo, H.; Diao, X.; Liu, H. Improving undersampling-based ensemble with rotation forest for imbalanced problem. Turk. J. Electr. Eng. Comput. Sci. 2019, 27, 1371–1386. [Google Scholar] [CrossRef]
- Guo, X.; Yin, Y.; Dong, C.; Yang, G.; Zhou, G. On the class imbalance problem. In Proceedings of the 2008 Fourth International Conference on Natural Computation, Jinan, China, 18–20 October 2008; pp. 192–201. [Google Scholar]
- Ng, W.W.; Zeng, G.; Zhang, J.; Yeung, D.S.; Pedrycz, W. Dual autoencoders features for imbalance classification problem. Pattern Recognit. 2016, 60, 875–889. [Google Scholar] [CrossRef]
- Oh, E.; Lee, H. An imbalanced data handling framework for industrial big data using a gaussian process regression-based generative adversarial network. Symmetry 2020, 12, 669. [Google Scholar] [CrossRef] [Green Version]
- Lee, H.; Kim, Y.; Kim, C.O. A deep learning model for robust wafer fault monitoring with sensor measurement noise. IEEE Trans. Semicond. Manuf. 2017, 30, 23–31. [Google Scholar] [CrossRef]
- Lee, K.B.; Cheon, S.; Kim, C.O. A convolutional neural network for fault classification and diagnosis in semiconductor manufacturing processes. IEEE Trans. Semicond. Manuf. 2017, 30, 135–142. [Google Scholar] [CrossRef]
- Cho, S.H.; Kim, S.; Choi, J.H. Transfer learning-based fault diagnosis under data deficiency. Appl. Sci. 2020, 10, 7768. [Google Scholar] [CrossRef]
- Iqbal, S.; Ghani, M.U.; Saba, T.; Rehman, A. Brain tumor segmentation in multi-spectral MRI using convolutional neural networks (CNN). Microsc. Res. Tech. 2018, 81, 419–427. [Google Scholar] [CrossRef]
- Xie, S.; Tu, Z. Holistically-Nested Edge Detection. Int. J. Comput. Vis. 2017, 125, 3–18. [Google Scholar] [CrossRef]
- Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G.S.; Dean, J. Distributed Representations of Words and Phrases and their Compositionality. In Advances in Neural Information Processing Systems, Lake Tahoe Nevada; Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q., Eds.; Curran Associates, Inc.: New York, NY, USA, 2013; Volume 26, pp. 3111–3119. [Google Scholar]
- Caesar, H.; Uijlings, J.; Ferrari, V. Joint Calibration for Semantic Segmentation. In Proceedings of the British Machine Vision Conference (BMVC), Swansea, UK, 7–10 September 2015. [Google Scholar]
- Mostajabi, M.; Yadollahpour, P.; Shakhnarovich, G. Feedforward semantic segmentation with zoom-out features. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3376–3385. [Google Scholar]
- Huang, C.; Li, Y.; Loy, C.C.; Tang, X. Deep imbalanced learning for face recognition and attribute prediction. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 2781–2794. [Google Scholar] [CrossRef] [Green Version]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollar, P. Focal Loss for Dense Object Detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2999–3007. [Google Scholar]
- Cao, K.; Wei, C.; Gaidon, A.; Arechiga, N.; Ma, T. Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss. In Proceedings of the 33rd Annual Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, BC, Canada, 8–14 December 2019. [Google Scholar]
- Anantrasirichai, N.; Bull, D.R. DefectNET: Multi-class fault detection on highly-imbalanced datasets. In Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 22–25 September 2019. [Google Scholar]
- Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; The MIT Press: Cambridge, UK, 2016. [Google Scholar]
- Hinton, G.; Vinyals, O.; Dean, J. Distilling the Knowledge in a Neural Network. arXiv 2015, arXiv:1503.02531. [Google Scholar]
- Guo, C.; Pleiss, G.; Sun, Y.; Weinberger, K.Q. On Calibration of Modern Neural Networks. In Proceedings of the 34th International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2017. [Google Scholar]
- Kull, M.; Perelló-Nieto, M.; Kängsepp, M.; de Menezes e Silva Filho, T.; Song, H.; Flach, P.A. Beyond Temperature Scaling: Obtaining Well-Calibrated Multiclass Probabilities with Dirichlet Calibration. In Proceedings of the 33rd Annual Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, BC, Canada, 8–14 December 2019. [Google Scholar]
- Kannan, H.; Kurakin, A.; Goodfellow, I.J. Adversarial Logit Pairing. arXiv 2018, arXiv:1803.06373. [Google Scholar]
- Kanai, S.; Yamada, M.; Yamaguchi, S.; Takahashi, H.; Ida, Y. Constraining Logits by Bounded Function for Adversarial Robustness. In Proceedings of the 2021 International Joint Conference on Neural Networks (IJCNN), Shenzhen, China, 18–22 July 2021. [Google Scholar]
- Shafahi, A.; Ghiasi, A.; Najibi, M.; Huang, F.; Dickerson, P.J.; Goldstein, T. Batch-Wise Logit-Similarity—Generalizing Logit-Squeezing and Label-Smoothing; BMVC: Cardiff, UK, 2019. [Google Scholar]
- Berger, J. Statistical Decision Theory: Foundations, Concepts, and Methods; Springer Series in Statistics; Springer: New York, NY, USA, 2013. [Google Scholar]
- Achille, A.; Rovere, M.; Soatto, S. Critical Learning Periods in Deep Networks. In Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- Sagun, L.; Evci, U.; Güney, V.U.; Dauphin, Y.N.; Bottou, L. Empirical Analysis of the Hessian of Over-Parametrized Neural Networks. In Proceedings of the 6th International Conference on Learning Representations, ICLR, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
- Gur-Ari, G.; Roberts, D.A.; Dyer, E. Gradient Descent Happens in a Tiny Subspace. arXiv 2018, arXiv:1812.04754. [Google Scholar]
- Frankle, J.; Schwab, D.J.; Morcos, A.S. The Early Phase of Neural Network Training. In Proceedings of the 8th International Conference on Learning Representations, ICLR, Addis Ababa, Ethiopia, 26–30 April 2020. [Google Scholar]
- Dua, D.; Graff, C. UCI Machine Learning Repository: APS Failure at Scania Trucks Data Set; Center for Machine Learning and Intelligent Systems, The University of California: Oakland, CA, USA, 2017. [Google Scholar]
- Karanja, B.; Broukhiyan, P. Commercial Vehicle Air Consumption: Simulation, Validation and Recommendation. DiVA. diva2:1113319. 2017. Available online: http://www.diva-portal.org/smash/record.jsf?pid=diva2:1113319 (accessed on 13 September 2021).
- Bakdi, A.; Kouadri, A. An improved plant-wide fault detection scheme based on PCA and adaptive threshold for reliable process monitoring: Application on the new revised model of Tennessee Eastman process. J. Chemom. 2018, 32, 1–16. [Google Scholar] [CrossRef]
- Qin, S.J. Survey on data-driven industrial process monitoring and diagnosis. Annu. Rev. Control 2012, 36, 220–234. [Google Scholar] [CrossRef]
- Shang, J.; Chen, M.; Ji, H.; Zhou, D. Recursive transformed component statistical analysis for incipient fault detection. Automatica 2017, 80, 313–327. [Google Scholar] [CrossRef]
- Patan, K.; Witczak, M.; Korbicz, J. Towards robustness in neural network based fault diagnosis. Int. J. Appl. Math. Comput. Sci. 2008, 18, 443–454. [Google Scholar] [CrossRef] [Green Version]
- Tayarani-Bathaie, S.S.; Khorasani, K. Fault detection and isolation of gas turbine engines using a bank of neural networks. J. Process Control 2015, 36, 22–41. [Google Scholar] [CrossRef]
- Frank, P.M.; Köppen-Seliger, B. Fuzzy logic and neural network applications to fault diagnosis. Int. J. Approx. Reason. 1997, 16, 67–88. [Google Scholar] [CrossRef] [Green Version]
- Wang, H.; Chai, T.Y.; Ding, J.L.; Brown, M. Data driven fault diagnosis and fault tolerant control: Some advances and possible new directions. Zidonghua Xuebao/Acta Autom. Sin. 2009, 35, 739–747. [Google Scholar] [CrossRef] [Green Version]
- Sarker, I.H. Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions. SN Comput. Sci. 2021, 2, 1–20. [Google Scholar] [CrossRef] [PubMed]
- Nair, V.; Hinton, G.E. Rectified Linear Units Improve Restricted Boltzmann Machines. In Proceedings of the 27th International Conference on Machine Learning (ICML 2010), Haifa, Israel, 21–24 June 2010. [Google Scholar]
- Ioffe, S.; Szegedy, C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, 7–9 July 2015; Bach, F., Blei, D., Eds.; PMLR: Lille, France, 2015; Volume 37, pp. 448–456. [Google Scholar]
- Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
- Kingma, D.P.; Ba, J.L. Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations, ICLR 2015—Conference Track Proceedings, San Diego, CA, USA, 7–9 May 2015; pp. 1–15. [Google Scholar]
- Choi, D.; Shallue, C.J.; Nado, Z.; Lee, J.; Maddison, C.J.; Dahl, G. On Empirical Comparisons of Optimizers for Deep Learning. arXiv 2019, arXiv:1910.05446. [Google Scholar]
- Dua, D.; Graff, C. UCI Machine Learning Repository: Steel Plates Faults Data Set; Center for Machine Learning and Intelligent Systems, The University of California: Oakland, CA, USA, 2017. [Google Scholar]
- Buscema, M.; Terzi, S.; Tastle, W. A new meta-classifier. In Proceedings of the 2010 Annual Meeting of the North American Fuzzy Information Processing Society, Toronto, ON, Canada, 12–14 July 2010; pp. 1–7. [Google Scholar]
- Buscema, M. MetaNet*: The Theory of Independent Judges. Subst. Use Misuse 1998, 33, 439–461. [Google Scholar] [CrossRef]
- Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, BC, Canada, 8–14 December 2019; Wallach, H., Larochelle, H., Beygelzimer, A., d’ Alché-Buc, F., Fox, E., Garnett, R., Eds.; Curran Associates, Inc.: New York, NY, USA, 2019; pp. 8024–8035. [Google Scholar]
- Wang, K.; Zhang, D.; Li, Y.; Zhang, R.; Lin, L. Cost-Effective Active Learning for Deep Image Classification. IEEE Trans. Circuits Syst. Video Technol. 2017, 27, 2591–2600. [Google Scholar] [CrossRef] [Green Version]
- Shannon, C.E. A mathematical theory of communication. ACM SIGMOBILE Mob. Comput. Commun. Rev. 2001, 5, 3–55. [Google Scholar] [CrossRef]
- Settles, B. Computer Sciences Active Learning Literature Survey; University of Wisconsin-Madison Department of Computer Sciences: Madison, WI, USA, 2009. [Google Scholar]
- Henne, M.; Schwaiger, A.; Roscher, K.; Weiss, G. Benchmarking uncertainty estimation methods for deep learning with safety-related metrics. CEUR Workshop Proc. 2020, 2560, 83–90. [Google Scholar]
- Cho, C.; Choi, W.; Kim, T. Leveraging Uncertainties in Softmax Decision-Making Models for Low-Power IoT Devices. Sensors 2020, 20, 4603. [Google Scholar] [CrossRef]
- Jain, R.K.; Chiu, D.M.W.; Hawe, W.R. A Quantitative Measurement of Fairness and Discrimination for Resource Allocation in Shared Computer System; Eastern Research Laboratory, Digital Equipment Corporation: Hudson, MA, USA, 1984; Volume 2. [Google Scholar]
- Weng, C.G.; Poon, J. A new evaluation measure for imbalanced datasets. Conf. Res. Pract. Inf. Technol. Ser. 2008, 87, 27–32. [Google Scholar]
- Chawla, N.V. Data Mining for Imbalanced Datasets: An Overview. In Data Mining and Knowledge Discovery Handbook; Maimon, O., Rokach, L., Eds.; Springer: Boston, MA, USA, 2005; pp. 853–867. [Google Scholar]
- Metz, C.E. Basic principles of ROC analysis. Semin. Nucl. Med. 1978, 8, 283–298. [Google Scholar] [CrossRef]
- Provost, F.; Fawcett, T.; Kohavi, R. The Case Against Accuracy Estimation for Comparing Induction Algorithms. In Proceedings of the Fifteenth International Conference on Machine Learning, San Francisco, CA, USA, 24–27 July 1998; pp. 445–453. [Google Scholar]
- Fawcett, T. An introduction to ROC analysis. Pattern Recognit. Lett. 2006, 27, 861–874. [Google Scholar] [CrossRef]
- Martin Ward Powers, D. Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness & Correlation. arXiv 2010, arXiv:2010.16061. [Google Scholar]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
Research | Specification | Remarks |
---|---|---|
Oversampling [31,33,34] | Data re-sampling technique for imbalanced datasets | Generates random replica samples from the minority class to balance the distribution of the target classes. Increases likelihood of over-fitting. |
Undersampling [31,33,34,35] | Data re-sampling technique for imbalanced datasets | Reduces the number of samples from the majority class to achieve a more balanced target class distribution. Loss of information through data purging of the original dataset. |
GPR-based GAN [36] | Data re-sampling and imputation for imbalanced datasets | Combines the use of Gaussian Process Regression and Generative Adversarial Network to impute missing data points and generate new samples. |
Convolutional Neural Network for Automatic Wafer Defect Identification (CNN-WDI) [25] | Data re-sampling and feature extraction for imbalanced datasets | Combines CNN feature extraction and oversampling through data augmentation for imbalanced dataset. Data augmentation techniques can be application specific. |
One-class Fault detection [27] | One-class learning for imbalanced datasets | Multi-network architecture with a fault-detection module based on one-class learning. Challenges scaling up as the number of target classes increase. |
Cost-sensitive reweighting [23,40,41,42,43,44,45] | Class-balancing weights for imbalanced datasets | Class-balancing weight hyperparameter for loss functions. Reweighting by inverse class frequency has been shown to have limited gains. |
Focal Loss (FL) [46] | Class-balancing penalty factor for imbalanced datasets | Balances loss for well-classified (easy) vs. misclassified (hard) samples during training. FL loss is more effective to intra-class data imbalance. |
Label-Distribution Aware Margin (LDAM) loss [47] | Label-dependent regularizer for imbalanced datasets | Label-dependent regularizer that depends on both the weight matrices and the labels for class-rebalancing. |
DefectNet for Fault Detection [48] | Class-rebalancing and feature extraction for imbalanced datasets | Combines CNN feature extraction and hybrid loss function for imbalanced dataset. Feature extraction module can be application specific. |
Transfer Learning-Based Fault Diagnosis [39] | Transfer Learning for imbalanced datasets | Transfer of knowledge from neural networks trained in domains with enough data to others in domains that encounter an imbalanced dataset. Performs well in scenarios where target and source domain a more similar. |
PRECISION | RECALL | F1 | CM | Total | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Method | Macro↑ | N↑ | P↑ | Macro↑ | N↑ | P↑ | Macro↑ | N↑ | P↑ | FP↓ | FN↓ | Cost↓ |
GPR-based GAN | 0.80 | 99.0 | 0.60 | 0.91 | 0.98 | 0.84 | 0.84 | 0.99 | 0.70 | 207 | 59 | 31,570 |
DLWP-RLPr-RLPr | 0.70 | 1.0 | 0.39 | 0.97 | 0.96 | 0.98 | 0.77 | 0.98 | 0.56 | 569 | 9 | 10,190 |
DLWP-RLPr-ENPr | 0.70 | 1.0 | 0.40 | 0.97 | 0.96 | 0.98 | 0.77 | 0.98 | 0.56 | 558 | 9 | 10,080 |
DLWP-RLPr-None | 0.67 | 1.0 | 0.35 | 0.97 | 0.96 | 0.98 | 0.75 | 0.98 | 0.51 | 690 | 6 | 9900 |
DLWP-None-ENPr | 0.84 | 1.0 | 0.69 | 0.93 | 0.99 | 0.88 | 0.88 | 0.99 | 0.77 | 147 | 46 | 24,470 |
DLWP-None-DENPr | 0.84 | 1.0 | 0.67 | 0.94 | 0.99 | 0.89 | 0.88 | 0.99 | 0.77 | 160 | 43 | 23,100 |
Fault | Entropy | Uncertainty (%) |
---|---|---|
Dirtiness | 5.903 × | 59.025 |
Dirtiness | 5.280 × | 52.802 |
Stains | 5.096 × | 50.961 |
Dirtiness | 4.610 × | 46.096 |
Dirtiness | 4.497 × | 44.968 |
Stains | 4.461 × | 44.606 |
Stains | 3.894 × | 38.942 |
Stains | 2.716 × | 27.163 |
Dirtiness | 2.335 × | 23.354 |
Dirtiness | 1.893 × | 18.926 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kafunah, J.; Ali, M.I.; Breslin, J.G. Handling Imbalanced Datasets for Robust Deep Neural Network-Based Fault Detection in Manufacturing Systems. Appl. Sci. 2021, 11, 9783. https://doi.org/10.3390/app11219783
Kafunah J, Ali MI, Breslin JG. Handling Imbalanced Datasets for Robust Deep Neural Network-Based Fault Detection in Manufacturing Systems. Applied Sciences. 2021; 11(21):9783. https://doi.org/10.3390/app11219783
Chicago/Turabian StyleKafunah, Jefkine, Muhammad Intizar Ali, and John G. Breslin. 2021. "Handling Imbalanced Datasets for Robust Deep Neural Network-Based Fault Detection in Manufacturing Systems" Applied Sciences 11, no. 21: 9783. https://doi.org/10.3390/app11219783
APA StyleKafunah, J., Ali, M. I., & Breslin, J. G. (2021). Handling Imbalanced Datasets for Robust Deep Neural Network-Based Fault Detection in Manufacturing Systems. Applied Sciences, 11(21), 9783. https://doi.org/10.3390/app11219783