Automatic Risk Assessment for an Industrial Asset Using Unsupervised and Supervised Learning

Rodrigues, João Antunes; Martins, Alexandre; Mendes, Mateus; Farinha, José Torres; Mateus, Ricardo J. G.; Cardoso, Antonio J. Marques

doi:10.3390/en15249387

Open AccessArticle

Automatic Risk Assessment for an Industrial Asset Using Unsupervised and Supervised Learning

by

João Antunes Rodrigues

^1,2,*

,

Alexandre Martins

^1,2,

Mateus Mendes

^3,4,*

,

José Torres Farinha

^3,5

,

Ricardo J. G. Mateus

²

and

Antonio J. Marques Cardoso

¹

CISE—Electromechatronic Systems Research Centre, University of Beira Interior, Calçada Fonte do Lameiro, 6201-001 Covilhã, Portugal

²

EIGeS—Research Centre in Industrial Engineering, Management and Sustainability, Universidade Lusófona, Campo Grande 376, 1749-024 Lisboa, Portugal

³

Polytechnic of Coimbra— ISEC, Quinta da Nora, 3030-199 Coimbra, Portugal

⁴

Department of Electrical and Computer Engineering, Institute of Systems and Robotics, University of Coimbra, 3030-194 Coimbra, Portugal

⁵

Department of Mechanical Engineering, Centre for Mechanical Engineering, Materials and Processes, University of Coimbra, 3030-290 Coimbra, Portugal

^*

Authors to whom correspondence should be addressed.

Energies 2022, 15(24), 9387; https://doi.org/10.3390/en15249387

Submission received: 31 October 2022 / Revised: 22 November 2022 / Accepted: 29 November 2022 / Published: 12 December 2022

(This article belongs to the Special Issue Modeling and Optimization of Electrical Systems)

Download

Browse Figures

Versions Notes

Abstract

Monitoring the condition of industrial equipment is fundamental to avoid failures and maximize uptime. The present work used supervised and unsupervised learning methods to create models for predicting the condition of an industrial machine. The main objective was to determine when the asset was either in its nominal operation or working outside this zone, thus being at risk of failure or sub-optimal operation. The results showed that it is possible to classify the machine state using artificial neural networks. K-means clustering and PCA methods showed that three states, chosen through the Elbow Method, cover almost all the variance of the data under study. Knowing the importance that the quality of the lubricants has in the functioning and classification of the state of machines, a lubricant classification algorithm was developed using Neural Networks. The lubricant classifier results were 98% accurate compared to human expert classifications. The main gap identified in the research is that the found classification works only carried out classifications of present, short-term, or mid-term failures. To close this gap, the work presented in this paper conducts a long-term classification.

Keywords:

maintenance; neural networks; k-means; MLPClassifer; unsupervised learning; supervised learning

1. Introduction

Management of the life cycle of assets is important for the economic success of any company. Equipment availability and its condition must reach levels of excellence, namely in industries that cannot have their assets stopped or operating at sub-optimal levels.

Equipment stopped due to breakdowns increases production costs, decreases productivity, increases the stock of raw materials and increases the number of semi-finished products. All these negative consequences make predictive maintenance an essential investment.

1.1. Framework

Kumar defined maintenance as the combination of all technical and administrative activities necessary to maintain equipment, facilities, and other physical assets in the desired operational conditions, to fulfil their function with quality [1]. Many companies have selected predictive maintenance as their maintenance strategy to increase the safety, quality, and availability of their assets while also fostering environmental sustainability [2].

It is crucial to predict the future behaviour of equipment in order to make timely management decisions to reduce failure and downtime and achieve the highest level of equipment availability. A way to anticipate the behaviour of an asset is to forecast its sensor values, using time series and machine learning prediction algorithms. This can be achieved if the equipment has the necessary sensors and enough historical data are available to train the algorithms.

Sensing techniques are increasingly cheaper, more precise, and less invasive. The recent evolution of sensing technology opens new avenues for training predictive algorithms with a higher degree of reliability [3]. Such algorithms usually require large amounts of data and long periods of training and optimization, to generate forecasts with low error. Therefore, it is necessary to have reliable data history. For this to happen, it is essential to have sensors correctly calibrated, installed, and connected to a functional data system.

Data processing capacity is constantly evolving, enabling industries to improve many of their production, maintenance, and even logistics processes. The new industrial generation requires changes in its processes in virtually all areas, including monitoring and forecasting of production [4], quality control, or maintenance based on operating conditions [5].

Currently, the industrial maintenance sector is in a phase of reorganization, exploration, and research, where Artificial Intelligence (AI) is increasingly used [6]. The main advantages of using AI are the reduction in maintenance costs and the increase in the asset’s availability, due to the use of intelligent machine learning algorithms to solve complex problems.

The main goal is to make the most of the assets’ potential, lengthen their useful lives, and maintain their value and sustainability through the use of the proper maintenance [7]. This change will be a step toward improving the sustainability of the organizations, as the lifespan of their physical assets increases. On the other hand, the overall life and results of the organization will be improved, due to good maintenance of the equipment [8].

Clustering is applied in several areas, especially pattern recognition, data mining, and decision support [9]. One of the most well-known non-hierarchical data grouping techniques is K-means [10].

The present work describes a case study where data grouping methods are applied to determine distinct states of operation of a paper press. The method used for this task was K-means. The dataset collected for this purpose is composed of the values of the following equipment variables: Electric current; Rotation speed; Torque; Pressure; Temperature; Oil level. The unsupervised approach is to find the optimal number of operating states of the equipment, while the supervised approach is to perform classification and prediction.

The major limitation found in the research conducted is that the found classifications works only performed classifications of failures at the moment or in the short or medium term. The method presented in this paper performs a long-term classification in order to bridge this gap.

Due to the large correlation between the state of a piece of equipment and its lubrication quality, it was decided to develop a classification algorithm for the lubricants of the press. In this study, we analysed the following oil parameters: Viscosity at 100 °C, PQ Index, TAN (Total Acid Number), Al, Cr, Cu, Fe, Na, Ni, Pb, Si, and Sn.

1.2. Objectives

The present research aims to propose a model for determining probable future states of a piece of equipment, based on a two-and-a-half-year dataset of sensed values of an industrial pulp paper press. State classification is performed through a classification neural network, which has an input vector that uses threshold lines in the various sensors according to the manufacturer’s recommendations to classify the asset’s condition.

The equipment’s states of operation must be determined using a clustering algorithm, such as k-means.

Another objective of this study is to develop a lubricant classifier algorithm based on neural networks with errors below 5% compared to the results of human experts.

1.3. Contributions

The contributions can have a positive impact on the availability of equipment and consequently reduce the company’s production costs.

The importance of this classification work is reinforced by several authors who claim that the focus should be shifted from short-term to long-term maintenance policies, and this work manages to classify the state of a paper press up to 30 days in advance. It is mentioned that this same methodology can be applied to other sensing equipment.

Another important contribution is the lubricant classifier, developed using neural networks. The results of that classifier are compared to the classifications of human experts. They had a hit rate close to 98%.

1.4. Motivation and Innovation

Industries are increasingly focusing on the long term, and the maintenance department has to be fully aligned with this long-term vision.

In view of the above, it is important to study and investigate the long-term risk assessment of equipment in order to classify the long-term status of an asset. No classification of an equipment’s status with an equal or superior time frame was found in the state of the art.

The classification works discovered only performed failure classifications in the present, in the short, or in the medium term, which is a limitation of the state of the art. To close this gap, the method described in this paper conducts a long-term classification.

Regarding the classification of the lubricants condition, the algorithm created has a 98% accuracy rate compared to the results of human experts, which is an increase of about 8% compared to the algorithm previously developed.

1.5. Paper Structure

The paper structure is as follows: Section 1 introduces the study; Section 2 presents a theoretical framework overview on the techniques used; Section 3 presents the state of the art; Section 4 presents the data used in this study; Section 5 explains the methodology of the work; Section 6 details how data clustering was performed; Section 7 describes neural network architectures; Section 8 presents the equipment status classification results; Section 9 shows the results of the press lubricant classification; Section 10 presents the limitations of this study; Section 11 concludes this paper.

2. Theoretical Framework

This section aims to briefly explain the main theoretical principles behind the techniques used in this study.

Machine learning methods can be classified as: 1. Supervised learning: requires a training set of inputs and outputs to learn a function that minimizes its prediction error; 2. Unsupervised Learning: tries to discover more compact representations from a dataset, without knowing their outputs; 3. Reinforcement learning: prescribes decisions based on their feedback on the objective it tries to maximize [11].

2.1. Artificial Neural Networks

Artificial neural networks (ANNs) are sophisticated adaptive systems that have the ability to modify their internal structure, which consists of a collection of interconnected nodes (neurons) between layers, in response to the input they receive.

The strength of the signal between any two nodes is represented by the weight assigned to each connection between them.

Supervised learning is achieved by adjusting the weights of these connections to minimize the error between the predicted output from the network and the target output.

ANN can have different applications, such as: forecast values [12], medical [13], business applications [14], pharmaceutical science [15], and speech recognition [16,17].

2.2. Data Grouping (Clustering)

Cluster analysis is an unsupervised learning technique used to group elements into groups (clusters), so that elements within the same group (cluster) are as similar as possible, while elements from distinct groups are as different as possible.

To define the similarity—or difference—between the elements, a distance function is used, which needs to be defined considering the context of the problem in question.

Methods of this type have applications in various fields, such as data visualization, pattern recognition, learning theory, computer graphics, identification, or classification.

2.3. K-Means

K-means is one of the most popular unsupervised learning algorithms used for data clustering [18]. This algorithm assigns each data point to one of the K groups (clusters) that minimize the square of the distance between that point and the centroid of each cluster.

The application of K-means suffers from some difficulties, such as the requirement that the number K of groups to be formed or their sensitivity to the initial conditions are provided a priori. Therefore, they must be determined experimentally during the data analysis process.

2.4. Principal Component Analysis (PCA)

Principal component analysis (PCA) is another unsupervised learning method that identifies a reduced set of the most significant components (transformed features) that explain the variance of the dataset [19].

The main objective of PCA is to condense the information contained in several original variables into a smaller set of statistical variables (principal components) with the least loss of information. The first components are those that explain most of the total variance of the original variables. By limiting the number of components, often outliers are also reduced or eliminated. The reduced set of selected variables is, therefore, easier to analyse, while still explaining much of the variance of the original data.

3. Related Work

This section presents several relevant works that use neural networks to classify equipment and lubricants, namely in predictive maintenance.

Transmission line faults are common in long-distance power transmission systems, so their classification is crucial. Mukherjee et al., proposed a method for classifying faults in the transmission lines using an approach based on PCA. This study extracted failure characteristics in terms of a Principal Component Index (PCI), followed by a threshold-based analysis of PCI values. The development of two threshold values helps to segregate the three distinct levels of fault disturbance in terms of PCI values, thus developing fault signatures for classification. According to the authors, this classification method presented a 99.78% accuracy.

The ability to group similar data is becoming more and more important, as the amount of data generated and used for analysis grows. Seal et al., proposed a non-Euclidean similarity measure, which is based on the non-linear Jeffreys divergence (JS). They then developed c-means using the proposed JS (Jc-means). The various properties of JS and Jc-means were discussed. All the analyses were carried out. The results demonstrated that Jc-means outperforms some state-of-the-art c-means algorithms [20].

Clustering is a crucial unsupervised machine learning technique used to find some underlying structure in a collection of patterns or objects. Karlekar et al., proposed the distance S, which is derived from the newly introduced divergence S, replacing the Euclidean distance of the conventional Fuzzy k-means (FKM) algorithm. With the aid of various datasets, the performance of the proposed FKM was compared to that of the traditional FKM with Euclidean distance and its variations. The comparative study demonstrated that the outcomes are solid. Additionally, the results showed that the modified FKM outperforms some cutting-edge FKM algorithms [21].

The harsh operating conditions are a common cause of some failures in industrial equipment. Analysing a large amount of data is how faults in mechanical gearbox systems are found and diagnosed. Sharma et al., created a new simple and effective peak density clustering algorithm based on an adaptive mixing distance for handling mixed data, as real-world datasets that encompass numerical and categorical attributes, in order to acquire more distinguishable fault characteristics under various conditions [22].

Due to the search for a more sustainable world, wind energy emerges as one of the most important sources of energy production. Zang et al., proposed a fault detection method for main bearing wind turbines based on SCADA data using an RNA artificial neural network. This algorithm makes it possible to identify the initial stage of main bearing failures, allowing for early intervention [23].

Ertun et al., created an algorithm using a neuro-fuzzy ANFIS-based multi-staged decision algorithm for the detection and diagnosis of bearing faults [24].

Rodrigues et al., used a neural network to predict and classify the degradation state of diesel engine oils from laboratory analysis data on 21 oils’ parameters, achieving an accuracy over 90% [25].

Lubricating oil plays an important role in vehicle maintenance, and good lubrication can extend engine life as well as reduce maintenance costs. Le et al., through machine learning models, classified the condition of military vehicle engine lubricating oils. Oil condition was classified into three categories: normal, degraded, and unsuitable [26].

Kittisupakorn et al., proposed to control a steel pickling process, using an algorithm based on a multilayer feed-forward neural network model [27].

Gajewski et al., presented a study focusing on transport system engines. The types of oils were obtained from heavy track engines. They used the data with neural networks to identify the patterns that model the deterioration of the system [28].

Using the data, manufacturers can produce goods of higher quality while spending less money by using predictive models for quality control. Zhang et al., proposed a two-stage method for doing this, first clustering the data into clusters based on the manufacturing process and then using supervised learning to predict the failed product in each cluster. Their goal was to predict manufacturing failures using the anonymous features. The final model was decided, based on the Random Forest algorithm’s performance [29].

Mazumder et al., used machine learning to develop a viable alternative to computationally intensive analytical approaches to assessing the failure risk of oil and gas pipelines. The conclusion is that XGBoost is the optimal algorithm to predict failure and is recommended for future analysis [30].

The works cited above show that neural networks can improve support in decision making in the maintenance and condition monitoring field.

4. Data Processing

4.1. Sensor Data Collection

An industrial press’s three-year history of six variables, including electric current, rotation speed, temperature, pressure, oil level, and torque, was provided for this study by a pulp company. The sampling frequency used by all sensors was one minute [31].

The recorded data must be of the highest calibre. Poor data can lead to models that are incorrect and have biased results. To increase confidence in the findings, the data were previously processed and analysed.

The different variables’ measurement units are as follows: Oil Level is measured in percentage of full tank (% Tank); Electric Current is measured in Amperes (A); Pressure is measured in Pascal (Pa); Temperature is measured in Degrees Celsius (°C); Torque is measured in Newton-meter (N × m); Rotation Velocity is measured in Rotations Per Minute multiplied by 1000 (RPM × 1000). Figure 1 presents a boxplot of the original data according to the six variables.

Figure 1 allows us to conclude that the collected data contained some discrepant points, including outliers such as null and repeated values, due to stoppages of the equipment under analysis. Outliers with higher values were probably the result of errors in the sensor reading or recording. Outliers with lower values probably resulted, in addition to the causes mentioned above, from scheduled and unscheduled stops of the equipment.

Outliers were replaced by the average value of the variable in the sliding window before the outlier. This method has been described in more detail by Mateus et al. [32].

Figure 2 shows the boxplot of the dataset after being filtered and treated as described above. As the figure shows, most of the discrepant samples were removed and there was more compactness of the data.

4.2. Data Enrichment

4.2.1. Equipment Nominal Operation Zones

According to the manufacturer’s user manual, the equipment is recommended to work between a predefined range for all six variables. For instance, Figure 3 depicts yellow lines representing the lower and upper temperature thresholds of the equipment when working in its normal functioning zone. Likewise, when the equipment works beyond the red lines, it generates a red alert indicating that is working in its failure zone and hence needs urgent attention. When the equipment is working between the yellow and red lines, it generates a yellow alert informing that the equipment needs to be checked for possible overload or anomaly. The dataset was enriched with this classification, namely indicating whether each variable value was within the range of normal, alert, or high risk of failure operation values provided by the equipment’s manufacturer.

4.2.2. Sensor Values Predicted at 30 Days

The aim of this study is to classify the state of the equipment 30 days in advance. The dataset was enriched with 30-day forecasts of each variable from a previously developed and validated neural network that can predict sensor variable values at 30 days with a MAPE error of less than 10% [33,34]. Figure 4 illustrates the respective predicted time series for the six variables (sensors).

4.3. Lubricating Oil Database

Oil analysis is an extremely important tool in predictive maintenance. Through this, it is possible to evaluate the conditions of the fluids and of the equipment.

The dataset of the lubricating oils in question was supplied by a company and contains all the lab results of oil analyses carried out on the press, as well as the classification of all the analyses by a human expert.

To assess the reliability of assets, increase their availability and clarify the condition of equipment, it is extremely important to know the condition of its lubricants.

It should be noted that through oil analysis, it is also possible to identify problems early, before they turn into serious failures. Due to the aforementioned reasons, the authors decided to develop an algorithm for the classification of the condition of press lubricating oil.

The present database contains 179 oil samples, and each oil sample contains the analysis of 12 lubricant parameters. The parameters analysed are Viscosity at 100 °C PQ Index, TAN (Total Acid Number), Al, Cr, Cu, Fe, Na, Ni, Pb, Si, and Sn. Table 1 presents a summary of statistical parameters of the variables used in this study.

Analysing Table 1, key variables such as Index PQ, TAN, Fe, and Viscosity at 100 °C have a large volatility. This volatility is a consequence of the differences in the state of the oil in the various analyses. Figure 5 graphically shows the variability in those parameters. The x-axis represents the oil analysis samples, and the y-axis represents the results of each parameter normalized between zero and one.

5. Data Processing

This section describes the methodology used to carry out this study.

To elucidate the readers, it was decided to make two flowcharts, one containing the methodology used to classify the state of the press (Figure 6) and the other one presenting the methodology used to classify the lubricant in the machine (Figure 7).

6. Clustering (Operating States)

The methodology applied for grouping the dataset into clusters, representing the operating states of the equipment, is as follows:

a.: Application of the K-means method described for different K values between 1 and 10;
b.: Specifying the number of clusters K;
c.: Initialization of the centroids of each cluster, randomly selecting, without repetition, a data point for each of the centroids of the K clusters;
d.: Calculation of the square of the distance between each of the remaining data points and each of the K centroids;
e.: Assignment of each of these data points to the cluster whose centroid is closest;
f.: Calculation of the new position of the centroids of each cluster according to the average position of all data points belonging to each cluster;
g.: Repetition of the last three steps, until the position of the centroids no longer changes;
h.: Determining the optimal number of clusters based on the elbow method [35];
i.: Use of PCA to reduce the number of variables to two and, thus, be able to view the data points classified according to the cluster to which they belong.

After applying the k-means method for K values between 1 and 10, the ideal number of clusters was determined using the Elbow method. For this purpose, the relationship between the number K (1,10) of clusters and the sum of the squares of the distances between each point and the centroid of the cluster where it was grouped was graphically represented, and then the ideal number of clusters is where the value of the dependent variable begins to stabilize, which visually resembles the shape of an elbow, hence the name of the method. From this value of K, the function starts to move almost parallel to the abscissa axis. The K value corresponding to this point where the error starts to stabilize is the optimal K value, that is, it represents the ideal number of clusters (Figure 8).

From the analysis of the graph presented in Figure 8, it was concluded that the ideal number of clusters was K = 3. The next step was to convert the multidimensional dataset (6 variables) into 2 dimensions (variables), just to be able to visualize the distribution of the dataset more easily by the three defined clusters. To this end, principal components analysis was applied to the initial dataset to represent the data in the two most representative principal components (PC1 and PC2) in terms of explained variance (Figure 9).

Figure 10 presents the dataset classified according to the three defined clusters, as well as their centroids. The x-axis represents Principal Component 1 and the y-axis represents Principal Component 2. The respective centroids are marked with a cross (x). The first two Principal Components aggregate about 90% of the total variance of the data. Due to this, it can be said that there is a minimal loss of information when using the two main components to form the 3 clusters as indicated by the Elbow Method.

Finally, the data were represented in Figure 11 according to several scatter plots that allow a visual analysis of the degree of association between the variables under analysis.

The last row and column in Figure 11 show the cluster rating of each data point. Analysing the various dispersion plots, 3 distinct states of asset functioning are unequivocally identified according to the various variables under analysis. This reinforces the idea that the equipment has three distinct operating states.

Analysing Figure 11, it is easy to identify three distinct groups of data in all ratios between variables. Temperature has higher values when current or torque is higher. The temperature is lower when the electric current is lower, the pressure is higher, and when the speed assumes its nominal value. It can be said that temperature and electric current are directly proportional. Electric current is higher when torque is higher. The circle of different colours represents the different clusters.

7. Neural Networks Architecture

7.1. Network Classification for the State of the Paper Press

A neural model was developed to automatically classify each data sample into one of the three operational states. To achieve this classification, a 30-day data prediction performed by a neural network using MLPRegressor was used [33,34]. This prediction database was separated into two parts: the first 80% was used for training the model, and the remaining 20% was for carrying out the tests.

In this classification, we chose to work with feedforward architectures (MultiLayer Perceptron), using the Sklearn Python library model called MLPClassifier. Knowing that the dataset is very large, we chose to use a graph-based optimization algorithm named “adam”, using a logistic sigmoid as activation function [36].

Several architecture combinations were tested to find the best possible network configuration in terms of accuracy, resulting in a final architecture with an accuracy above 96%.

Knowing that three clusters cover almost all the variance of the data, as indicated in the previous section, the authors defined that the neural network would classify the machine in three states.

The network is composed of a first layer with 6 neurons that receive information from the press sensors, then the information is processed by 2 layers of hidden neurons (100, 10). The output of this neural network is the following: Good, Alert, Failure. Figure 12 depicts the chosen ANN architecture for classification for the state of the paper press.

Training this classification network took approximately 4 min using Apple’s M1 processor. The neural network required 150 iterations and had a final loss of 0.061

7.2. Neural Network for Press Lubricant Classification

The network is composed of a first layer with 12 neurons that receive information from the press sensors, then the information is processed by 3 layers of hidden neurons (500, 100, 10). The output of this neuronal network is the following: Oil in good Condition or Replace the oil.

The “lbfgs” solvers were chosen for this architecture, using “relu” as the activation function. The network needed 455 iterations for training. Training this classification network takes approximately 2 min using Apple’s M1 processor. Figure 13 represents the ANN architecture chosen for oil classification.

7.3. Evaluation Models

To fully evaluate the effectiveness of a model, you must examine both precision and recall. Precision quantifies the number of positive class predictions that belong to the positive class. Recall quantifies the number of positive class predictions made from all positive examples in the dataset. Recall can also be called True Positive Rate (TPR), or Sensitivity.

Accuracy is a metric that summarizes the performance of a classification model as the number of correct predictions divided by the total number of predictions.

The next equations show the formulas for these metrics, where TP is True Positives, FN is False Negatives, FP is False Positives, and TN is True Negatives.

P r e c i s i o n = \frac{T P}{T P + F P}

(1)

R e c a l l = \frac{T P}{T P + F P}

(2)

A c c u r a c y = \frac{C o r r e c t P r e d i c t i o n s}{T o t a l P r e d i c t i o n} = \frac{T P + T N}{T P + T N + F P + F N}

(3)

F 1 S c o r e = \frac{T P}{T P + \frac{1}{2} (F P + F N)}

(4)

Macro AVG is the arithmetic mean of the individual classes’ score in relation to precision, recall, and F1-score.

Weighted average considers how many of each class there were in its calculation, so fewer of one class means that its precision/recall/F1 score has less of an impact on the weighted average for each of those things.

8. Press State Classification Results

The classifier algorithm showed very good results, as the error was below the defined p-value of 5%. The results of the press state of the classification network were compared with the results of the classification network, the convergence was 96%.

Table 2 presents the different metrics and accuracy in the different states of the press. It was in the Failure classification that the network had its highest success rate.

Figure 14 shows the results of the classification network. It is possible to observe the prediction of the press states in the future.

Analysing Figure 14, it is observed that the press in the future will work mostly in the normal state, with five alerts and three expected malfunctions.

9. Lubricating Oil Classification Results

The condition of the press and any equipment depends a lot on the condition of the lubricant, so it is important to know its state of degradation. The algorithm created had an error well below the defined p-value of 5%. The accuracy of this classifier was 98%. Thus, the algorithm proved to be quite reliable to classify the oil of any equipment if they have a robust oil analysis database. Table 3 presents the classification of the lubricating oil results.

Table 4 presents a confusion matrix where it can be observed that the classification network only missed a classification comparing its results with the results of human experts.

10. Limitations

One major limitation of the present approach is that it is based on machine learning, which uses inductive reasoning. There are no guarantees regarding the certainty of fault detection. However, the quality of the predictive data that are fed into the press condition classification network is quite good. The classification network learned according to the limit lines that were stipulated based on the history of the equipment and the recommendations of the technicians.

Finding the hyperparameters to use in the lubricant classification network and machine status classification network, so that the results meet the predefined objectives and errors, was the biggest challenge to overcome in this work. Obtaining a reliable and good-quality long-term forecast was the other difficulty encountered in this study [33,34]. To achieve small error margins, it is necessary to use deep knowledge of the machine being modeled, as well as machine learning methods. The parameters and methods are only valid for the machine being studied, even though similar procedures may be followed to pursue similar or better results for other machines.

11. Conclusions

Through clustering using Kmeans, it was shown that it is possible to identify equipment operating zones and, thus, define how many states the classifier network should have at its output.

The present developed model can classify future failures in a paper press considering a long-term forecast database of values.

The contribution of a 30-day classification is innovative and provides a great advantage in industrial planning, as it allows you to schedule stops a month in advance.

This methodology is very important in this area, as it can be applied to monitor this and other equipment automatically, if these assets have a robust database of their sensors.

The lubricant classifier was developed using neural networks. The results of this classifier, compared to the human expert classifications, had a hit rate close to 98%.

These contributions can have a significant impact on the quality of operation and availability of assets, thus reducing maintenance costs.

Author Contributions

Conceptualization, J.A.R. and M.M.; Methodology, M.M.; Software, J.A.R., A.M. and M.M.; Formal analysis, J.A.R.; Investigation, J.A.R.; Data curation, J.A.R.; Writing—original draft, J.A.R.; Writing—review & editing, M.M., R.J.G.M. and A.J.M.C.; Supervision, J.T.F.; Project administration, J.T.F. All authors have read and agreed to the published version of the manuscript.

Funding

The research leading to these results has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement 871284 project SSHARE and the European Regional Development Fund (ERDF) through the Operational Programme for Competitiveness and Internationalization (COMPETE 2020), under Project POCI-01-0145-FEDER-029494, and by National Funds through the FCT—Portuguese Foundation for Science and Technology, under Projects PTDC/EEI-EEE/29494/2017, UIDB/04131/2020, and UIDP/04131/2020. This research is sponsored by FEDER funds through the program COMPETE—Programa Operacional Factores de Competitividade—and by national funds through FCT—Fundação para a Ciência e a Tecnologia—under the project UIDB/00285/2020. This work was produced with the support of INCD funded by FCT and FEDER under the project 01/SAICT/2016 n° 022153.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ANN	Artificial Neural Network
AVG	Average
FF	Feed Forward
ITER	Iterations
MAPE	Mean Absolute Percentage Error
MLP	Multi-Layer Perceptron
MSE	Mean Square Error
PC	Principal Component
PCA	Principal Component Analysis
PCI	Principal Component Index
RF	Random Forest
RNN	Recurrent Neural Network
TAN	Total Acid Number
TPR	True Positive Rate

References

Kumar, U.; Galar, D.; Parida, A.; Stenström, C.; Berges, L. Maintenance Performance Metrics: A State-of-the-art Review. J. Qual. Maint. Eng. 2013, 19, 233–277. [Google Scholar] [CrossRef]
Selcuk, S. Manutenção Preditiva, Sua Implementação e Últimas Tendências. Proc. Inst. Mech. Eng. Part B J. Eng. Manuf. 2017, 231, 1670–1679. [Google Scholar] [CrossRef]
Martins, A.; Fonseca, I.; Farinha, J.; Reis, J.; Cardoso, A.J.M. Maintenance Prediction through Sensing Using Hidden Markov Models-A Case Study. Appl. Sci. 2021, 11, 7685. [Google Scholar] [CrossRef]
Uygun, Y. Industry 4.0: Principles, Effects and Challenges—Nova Science Publishers; Nova Science Publishers: Hauppauge, NY, USA, 2020; ISBN 1-5361-8423-3. [Google Scholar]
Cachada, A.; Barbosa, J.; Leitño, P.; Gcraldcs, C.A.S.; Deusdado, L.; Costa, J.; Teixeira, C.; Teixeira, J.; Moreira, A.H.J.; Moreira, P.M.; et al. Maintenance 4.0: Intelligent and Predictive Maintenance System Architecture. In Proceedings of the 2018 IEEE 23rd International Conference on Emerging Technologies and Factory Automation (ETFA), Turin, Italy, 4–7 September 2018; Volume 1, pp. 139–146. [Google Scholar]
Rodrigues, J.; Torres Farinha, J.; Marques Cardoso, A. Predictive Maintenance Tools—A Global Survey. WSEAS Trans. Syst. Control 2021, 16, 96–109. [Google Scholar] [CrossRef]
de-Almeida-e-Pais, J.E.; Raposo, H.; Farinha, J.; Cardoso, A.J.M.; Marques, P. Optimizing the Life Cycle of Physical Assets through an Integrated Life Cycle Assessment Method. Energies 2021, 2021, 6128. [Google Scholar] [CrossRef]
de-Almeida-e-Pais, J.E.; Farinha, J.; Cardoso, A.J.M.; Raposo, H. Optimizing the Life Cycle of Physical Assets -a Review. WSEAS Trans. Syst. Control 2020, 15, 417–430. [Google Scholar] [CrossRef]
Jain, A.K.; Murty, M.N.; Flynn, P.J. Data Clustering: A Review. ACM Comput. Surv. 1999, 31, 264–323. [Google Scholar] [CrossRef]
Xie, J.; Jiang, S.; Xie, W.; Gao, X. An Efficient Global K-Means Clustering Algorithm. JCP 2011, 6, 271–279. [Google Scholar] [CrossRef]
Bala, R.; Kumar, D.D. Classification Using ANN: A Review. Int. J. Comput. Intell. Res. 2017, 13, 10. [Google Scholar]
Rodrigues, J.; Farinha, J.; Mendes, M.; Mateus, R.; Cardoso, A.J.M. Short and Long Forecast to Implement Predictive Maintenance in a Pulp Industry. Eksploat. Niezawodn.-Maint. Reliab. 2021, 24, 33–41. [Google Scholar] [CrossRef]
Shenbagarajan, A.; Ramalingam, V.; Balasubramanian, C.; Palanivel, S. Tumor Diagnosis in MRI Brain Image Using ACM Segmentation and ANN-LM Classification Techniques. Indian J. Sci. Technol. 2016, 9. [Google Scholar] [CrossRef]
Rajab, S.; Sharma, V. Performance Evaluation of ANN and Neuro-Fuzzy System in Business Forecasting. In Proceedings of the 2015 2nd International Conference on Computing for Sustainable Global Development (INDIACom), New Delhi, India, 11–13 March 2015; pp. 749–754. [Google Scholar]
Agatonovic-Kustrin, S.; Beresford, R. Basic Concepts of Artificial Neural Network (ANN) Modeling and Its Application in Pharmaceutical Research. J. Pharm. Biomed. Anal. 2000, 22, 717–727. [Google Scholar] [CrossRef] [PubMed]
Paul, A.K.; Das, D.; Kamal, M.M. Bangla Speech Recognition System Using LPC and ANN. In Proceedings of the 2009 Seventh International Conference on Advances in Pattern Recognition, Kolkata, India, 4–6 February 2009; pp. 171–174. [Google Scholar]
Wahyuni, E.S. Arabic Speech Recognition Using MFCC Feature Extraction and ANN Classification. In Proceedings of the 2017 2nd International conferences on Information Technology, Information Systems and Electrical Engineering (ICITISEE), Yogyakarta, Indonesia, 1–3 November 2017; pp. 22–25. [Google Scholar]
Sinaga, K.P.; Yang, M.-S. Unsupervised K-Means Clustering Algorithm. IEEE Access 2020, 8, 80716–80727. [Google Scholar] [CrossRef]
Kurita, T. Principal Component Analysis (PCA). In Computer Vision: A Reference Guide; Springer International Publishing: Cham, Switzerland, 2019; pp. 1–4. ISBN 978-3-030-03243-2. [Google Scholar]
Seal, A.; Karlekar, A.; Krejcar, O.; Herrera-Viedma, E. Performance and Convergence Analysis of Modified C-Means Using Jeffreys-Divergence for Clustering. IJIMAI 2021, 7, 141. [Google Scholar] [CrossRef]
Karlekar, A.; Seal, A.; Krejcar, O.; Gonzalo-Martin, C. Fuzzy K-Means Using Non-Linear S-Distance. IEEE Access 2019, 7, 55121–55131. [Google Scholar] [CrossRef]
Sharma, K.K.; Seal, A.; Yazidi, A.; Krejcar, O. A New Adaptive Mixture Distance-Based Improved Density Peaks Clustering for Gearbox Fault Diagnosis. IEEE Trans. Instrum. Meas. 2022, 71, 1–16. [Google Scholar] [CrossRef]
Zhang, Z.-Y.; Wang, K.-S. Wind Turbine Fault Detection Based on SCADA Data Analysis Using ANN. Adv. Manuf. 2014, 2, 70–78. [Google Scholar] [CrossRef]
Ertunc, H.M.; Ocak, H.; Aliustaoglu, C. ANN- and ANFIS-Based Multi-Staged Decision Algorithm for the Detection and Diagnosis of Bearing Faults. Neural Comput. Appl. 2013, 22, 435–446. [Google Scholar] [CrossRef]
Rodrigues, J.; Cost, I.; Farinha, J.T.; Mendes, M.; Margalho, L. Predicting Motor Oil Condition Using Artificial Neural Networks and Principal Component Analysis. Eksploat. Niezawodn. 2020, 22, 440–448. [Google Scholar] [CrossRef]
Le, V.T.; Lim, C.P.; Mohamed, S.; Nahavandi, S.; Yen, L.; Gallasch, G.E.; Baker, S.; Ludovici, D.; Draper, N.; Wickramanayake, V. Condition Monitoring of Engine Lubrication Oil of Military Vehicles: A Machine Learning Approach. AIAC 2017, 8. [Google Scholar]
Kittisupakorn, P.; Thitiyasook, P.; Hussain, M.A.; Daosud, W. Neural Network Based Model Predictive Control for a Steel Pickling Process. J. Process Control 2009, 19, 579–590. [Google Scholar] [CrossRef]
Gajewski, J.; Vališ, D. The Determination of Combustion Engine Condition and Reliability Using Oil Analysis by MLP and RBF Neural Networks. Tribol. Int. 2017, 115, 557–572. [Google Scholar] [CrossRef]
Zhang, D.; Xu, B.; Wood, J. Predict Failures in Production Lines: A Two-Stage Approach with Clustering and Supervised Learning. In Proceedings of the 2016 IEEE International Conference on Big Data (Big Data), Washington, DC, USA, 5–8 December 2016; pp. 2070–2074. [Google Scholar]
Mazumder, R.K.; Salman, A.M.; Li, Y. Failure Risk Analysis of Pipelines Using Data-Driven Machine Learning Algorithms. Struct. Saf. 2021, 89, 102047. [Google Scholar] [CrossRef]
Mateus, B.C.; Mendes, M.; Farinha, J.T.; Cardoso, A.M. Anticipating Future Behavior of an Industrial Press Using LSTM Networks. Appl. Sci. 2021, 11, 6101. [Google Scholar] [CrossRef]
Mateus, B.; Farinha, J.T.; Mendes, M.; Martins, A.B.; Cardoso, A.M. Data Analysis for Predictive Maintenance Using Time Series and Deep Learning Models—A Case Study in a Pulp Paper Industry. 2021. Available online: https://www.researchgate.net/publication/363646966_Data_Analysis_for_Predictive_Maintenance_Using_Time_Series_and_Deep_Learning_Models-A_Case_Study_in_a_Pulp_Paper_Industry (accessed on 1 December 2022).
Rodrigues, J.A.; Farinha, J.T.; Mendes, M.; Mateus, R.J.G.; Cardoso, A.J.M. Comparison of Different Features and Neural Networks for Predicting Industrial Paper Press Condition. Energies 2022, 15, 6308. [Google Scholar] [CrossRef]
Rodrigues, J.A.; Farinha, J.T.; Cardoso, A.M.; Mendes, M.; Mateus, R. Prediction of Sensor Values in Paper Pulp Industry Using Neural Networks. In Proceedings of IncoME-VI and TEPEN 2021; Zhang, H., Feng, G., Wang, H., Gu, F., Sinha, J.K., Eds.; Springer International Publishing: Cham, Switzerland, 2022; pp. 281–291. [Google Scholar]
Thorndike, R.L. Who Belongs in the Family? Psychometrika 1953, 18, 267–276. [Google Scholar] [CrossRef]
Sklearn.Neural_network.MLPClassifier. Available online: https://scikit-learn/stable/modules/generated/sklearn.neural_network.MLPClassifier.html (accessed on 1 December 2022).

Figure 1. Boxplot of the original data.

Figure 2. Boxplot of processed data.

Figure 3. Normal, alert, and failures zones for the variable Temperature.

Figure 4. Predicted time series of sensor values.

Figure 5. Graph of all oil analyses (Index PQ, TAN, Fe, and Viscosity at 100 °C).

Figure 6. Methodology used to classify press condition.

Figure 7. Methodology used to classify the lubricating oil of the press.

Figure 8. Application of the elbow method to determine the optimal number of clusters.

Figure 9. Variance in CP1 and CP2.

Figure 10. Representation of the data in the three clusters according to CP1 (x-axis) and CP2 (y-axis).

Figure 11. Matrix of the scattering plots of five variables and the state of operation.

Figure 12. Architecture of the ANN for classification of the state of the paper press.

Figure 13. Architecture ANN for classification for oil.

Figure 14. Time series of Press classification results.

Table 1. Metrics of Oil Analysis.

	Units	Mean	Min	Max	Var	Std
TAN (Total Acid Number)	mgKOH/g	1.26	0.18	2.85	0.26	0.52
PQIndex	ppm	131.78	0.00	6732.00	396,718.55	631.63
Al Content	ppm	1.30	0.00	15.00	8.00	2.84
Cr Content	ppm	5.59	0.00	2.00	34.02	5.85
Cu Content	ppm	9.16	0.00	243.00	815.87	28.65
Fe Content	ppm	260.17	2.00	1231.00	91,004.30	302.55
Na Content	ppm	5.21	0.00	38.00	25.82	5.10
Ni Content	ppm	4.20	0.00	26.00	17.16	4.16
Pb Content	ppm	0.51	0.00	30.00	6.25	2.51
Si Content	ppm	2.39	0.00	22.00	8.10	2.85
Sn Content	ppm	1.07	0.00	8.00	2.62	1.62
Viscosity at 100 °C	m²/s	3035.76	954.40	4146.20	168,647.64	436.90

Table 2. Press classification results.

Classification	Precision	Recall	F1-Score
Normal	0.96	1.00	0.98
Alert	0.98	0.85	0.91
Failure	1.00	0.90	0.94
Accuracy			0.96
Macro AVG	0.98	0.91	0.94
Weighted AVG	0.96	0.96	0.96

Table 3. Classification of the lubricating oil results.

Classification	Precision	Recall	F1-Score	Support
Oil in good Condition	0.96	1.00	0.98	27
Replace the oil	1.00	0.94	0.97	18
Accuracy			0.98	45
Macro AVG	0.98	0.97	0.98	45
Weighted AVG	0.98	0.98	0.98	45

Table 4. Lubricating oil results (Confusion Matrix).

		Predictive Value
		Oil in Good Condition	Replace the Oil
Real	Oil in good Condition	27	0
Real	Replace the oil	1	17

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Rodrigues, J.A.; Martins, A.; Mendes, M.; Farinha, J.T.; Mateus, R.J.G.; Cardoso, A.J.M. Automatic Risk Assessment for an Industrial Asset Using Unsupervised and Supervised Learning. Energies 2022, 15, 9387. https://doi.org/10.3390/en15249387

AMA Style

Rodrigues JA, Martins A, Mendes M, Farinha JT, Mateus RJG, Cardoso AJM. Automatic Risk Assessment for an Industrial Asset Using Unsupervised and Supervised Learning. Energies. 2022; 15(24):9387. https://doi.org/10.3390/en15249387

Chicago/Turabian Style

Rodrigues, João Antunes, Alexandre Martins, Mateus Mendes, José Torres Farinha, Ricardo J. G. Mateus, and Antonio J. Marques Cardoso. 2022. "Automatic Risk Assessment for an Industrial Asset Using Unsupervised and Supervised Learning" Energies 15, no. 24: 9387. https://doi.org/10.3390/en15249387

APA Style

Rodrigues, J. A., Martins, A., Mendes, M., Farinha, J. T., Mateus, R. J. G., & Cardoso, A. J. M. (2022). Automatic Risk Assessment for an Industrial Asset Using Unsupervised and Supervised Learning. Energies, 15(24), 9387. https://doi.org/10.3390/en15249387

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Automatic Risk Assessment for an Industrial Asset Using Unsupervised and Supervised Learning

Abstract

1. Introduction

1.1. Framework

1.2. Objectives

1.3. Contributions

1.4. Motivation and Innovation

1.5. Paper Structure

2. Theoretical Framework

2.1. Artificial Neural Networks

2.2. Data Grouping (Clustering)

2.3. K-Means

2.4. Principal Component Analysis (PCA)

3. Related Work

4. Data Processing

4.1. Sensor Data Collection

4.2. Data Enrichment

4.2.1. Equipment Nominal Operation Zones

4.2.2. Sensor Values Predicted at 30 Days

4.3. Lubricating Oil Database

5. Data Processing

6. Clustering (Operating States)

7. Neural Networks Architecture

7.1. Network Classification for the State of the Paper Press

7.2. Neural Network for Press Lubricant Classification

7.3. Evaluation Models

8. Press State Classification Results

9. Lubricating Oil Classification Results

10. Limitations

11. Conclusions

Author Contributions

Funding

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI