**1. Introduction**

Hydraulic systems play an important role in a wide variety of industrial applications, such as robotics, manufacturing, aerospace, and engineering machinery. Monitoring the condition of hydraulic equipment can not only effectively improve productivity and reduce maintenance costs and downtime, but also improve the reliability and safety of this equipment in its application [1–3]. In particular, the hydraulic valve is the core control component of the hydraulic system, and it is widely used in numerous engineering applications to control the flow and pressure of fluids [4–6]. In the hydraulic system, a vibration analysis (VA) is the most popular and efficient condition monitoring technique for rotating systems including the hydraulic pump, electric motor, bearing, and more [7–16]. However, the working process of the valve core of the hydraulic valve is a reciprocating motion. These VA methods, which have been successfully applied in rotating machinery, will not be suitable for fault diagnosis and a condition monitoring signal analysis of non-rotating machinery, such as the hydraulic valve [17–19].

Many studies on fault diagnosis of the hydraulic valve have been conducted by theoretical approaches and test measurements, and certain research results have been obtained. Wu et al. [20] proposed a method for the mechanical fault diagnosis based on complex three-order cumulants. In the experiment regarding the fault diagnosis of the overflow valve, the results show that this method can improve the correction rate of diagnosis. Huang et al. [21] applied the theory of higher-order spectrum to the fault diagnosis of hydraulic valves. Li et al. [22] proposed a fault diagnosis method that involves choosing the fractal characteristic volume of a valve's displacement signal as a criterion to solve the nonlinear problems in the working process of autopilot hydraulic valves. Raduenz et al. [23] presented the development of a method for condition monitoring and online fault detection on proportional reversing valves. The effectiveness of the method to monitor and detect faults in valves with different sizes and constructive parameters was shown experimentally using five different proportional valves. Vianna et al. [24] presented a method to estimate degradation in a servo valve using an application of the Fading Extended Kalman Filter for system identification. Folmer et al. [25] also presented a data-driven fault detection system for valves, which uses historical process data obtained across company borders to detect faults by comparing standardized flow coefficients determined by DIN IEC 60534-2-1 in physical valve models. Moreover, many challenges emerge in the study of the condition monitoring and fault diagnosis of hydraulic valves. In particular, there are few research results for identifying hydraulic valve faults by pressure signals in the hydraulic system with condition monitoring on an Industrial Internet of Things (IIoT) platform.

Due to the availability of big data technology and data mining methods as well as the emergence of new IIoT platforms and machine learning algorithms, fault diagnosis for hydraulic valves based on big data for hydraulic system with condition monitoring is one of the focuses for this research [26–28]. Among them, Principal Component Analysis (PCA) is an effective method for dimensionality reduction in big data analysis. It is a multivariate statistical method, which compresses multiple linearly related variables into a few unrelated variables. PCA was first proposed by Pearson [29] in a study on optimal linear and plane fitting of spatial data. Fisher and Mackenzie [30] believed that PCA was more useful in the system response variance analysis than in system modeling, and they proposed a prototype of the Nonlinear Iterative Partial Least Squares (NIPALS) algorithm. Then, PCA was improved by Hoteling [31] and further developed into a common method widely used in data dimensionality reduction, fault diagnosis, and anomaly detection. For instance, Mohanty et al. [32] developed a new algorithm to identify bearing faults using empirical mode decomposition and principal component analysis (EMD-PCA) based on the average kurtosis technique. It was observed that this proposed combined approach effectively and adaptively identified inner ball faults. Stief et al. [33] proposed a sensor fusion approach to diagnose both electrical and mechanical faults in induction motors based on the combination of a two-stage Bayesian method and PCA. Caggiano [34] also proposed an advanced feature extraction methodology based on PCA. By introducing artificial neural networks to the PCA features, an accurate diagnosis of tool flank wear was achieved, with predicted values being very close to the measured tool wear values. Wang et al. [35] developed a variable selection algorithm based on PCA with multiple selection criteria, which can identify faults in wind turbines, determine the corresponding time and location where the fault occurs, and estimate its severity. Xiao et al. [36] also studied the application of PCA to fault diagnosis in Electro-Hydrostatic Actuators (EHAs). The experimental results demonstrated that PCA can effectively discriminate faults and their characteristics for EHAs, and could be used as an optional data fusion tool for the Prognostics and Health Management (PHM) of EHAs. Riba et al. [37] proposed a very fast, noninvasive, accurate, and easy-to-apply method to discriminate between paperboard samples produced from recovered and virgin fibers. For this method, FTIR spectroscopy was analyzed in combination with feature extraction methods such as PCA, PCA+ canonical variate analysis (CVA), extended canonical variate analysis (ECVA), and the *k* Nearest Neighbor algorithm (*k*NN) classifier. The experimental results proved that the proposed scheme allowed for the obtainment of a high classification accuracy with a very fast response.

In addition, the eXtreme Gradient Boosting (XGBoost) algorithm, proposed by Dr. Chen Tianqi in 2014, can automatically utilize the central processing unit (CPU) multi-threaded parallel computing and has the advantages of low computational complexity, fast running speed, and high accuracy, no matter whether the data scale is large or small [38,39]. At present, this method has been successfully applied in many fields, such as fault diagnosis, environmental prediction, and medical detection. Zhang et al. [40] designed an efficient machine learning method that combined random forests (RFs) with XGBoost and was used to establish the fault detection framework of data-driven wind turbines. The results indicated that the proposed approach was robust in various wind turbine models, including offshore ones, under different working conditions. Chakraborty and Elzarka [41] developed an XGBoost model with a dynamic threshold for early detection of faults in Heating Ventilation and Air Conditioning (HVAC) systems. Zhang et al. [42] applied the XGboost algorithm to the fault diagnosis of rolling bearings, and the results showed that the XGboost algorithm was superior to other tree algorithms in accuracy and time. Nguyen et al. [43] developed an XGBoost model to predict peak particle velocity (PPV). The results indicated that the developed XGBoost model, on both training and testing datasets, exhibited higher performance than the support vector machine (SVM), the Random Forests (RFs), and *k*NN models. Pan B et al. [44] applied the XGBoost algorithm to predict the concentration of PM2.5 per hour. Liu and Qiao [45] proposed a prediction method based on clustering and XGboost algorithms for the incidence of heart disease, which shows that the proposed method was feasible and effective. Fitriah et al. [46] proposed an algorithm combining PCA preprocessing with XGBoost classification to diagnose stroke patients in Indonesia, and the accuracy of diagnosis was increased by using fewer electrodes. PCA could reduce dimensionality and computation cost without decreasing classification accuracy. The XGBoost, as the scalable tree boosting classifier, can solve practical scalability problems with minimal resources.

Huawei launched the Machine Learning Service (MLS) in September 2017, which is a service that was launched on the IIoT platform for data mining and analysis by Huawei in September 2017 [47]. It has more than 300 algorithm function nodes, which can conveniently build visual workflow models to perform data processing, model training, evaluation, and prediction. In addition, Jupyter Notebook is integrated in MLS, and the algorithm functions can be extended by tools such as Python and R, in order to provide cloud customized services for the collection and analysis of massive data. Moreover, it can provide a cloud platform for the integration of technology, experience, and machine learning algorithms. At present, attempts are made to apply MLS in the fields of product recommendation, customer grouping, abnormality detection, predictive maintenance, and driving behavior analysis.

In summary, the existing fault diagnosis methods for hydraulic valves are not suitable for extracting fault features from pressure signals in hydraulic valve condition monitoring. It is very necessary to research a fault diagnosis method for hydraulic valves through a cloud service on the IIoT platform, where there is an inevitable demand. There will be a development trend for analyzing big data in hydraulic system condition monitoring in the future. In this paper, a novel fault diagnosis method is proposed, depending on a cloud service, for the typical faults in hydraulic directional valves. The method is based on the cloud service of MLS, using raw sensor data collected from inlet and outlet pressure signals in hydraulic valve condition monitoring, and it integrates both the advantages of the PCA descending dimension and the XGBoost classification.

The outline of the paper is as follows: Sections 2 and 3 summarize the PCA dimension reduction and the XGBoost algorithm principle. In Section 4, the hydraulic test bed is introduced, and the raw data acquisition scheme for condition monitoring is described based on the hydraulic system schematic diagram. In Section 5, the raw data for condition monitoring are analyzed, and inlet and outlet pressure signals of the hydraulic directional valve are selected as the sample. The PCA-XGBoost fault diagnosis model for hydraulic valves is built on an MLS cloud service platform, and, compared with the Principal Component Analysis and Classification And Regression Trees (PCA-CART) and the Principal Component Analysis and Random Forests (PCA-RFs) models, the test results indicate that the model is advanced. Section 6 concludes the proposed approach and shows future work regarding data analytics.

#### **2. Principal Component Analysis-Based Data Dimensionality Reduction**

## *2.1. Principle of PCA Dimensionality Reduction*

PCA dimensionality reduction replaces the original dimension with a smaller number of unrelated dimensions. This occurs in order to map m-dimensional features to *k*-dimensional features (*k* < *m*). These unrelated dimensions are called principal components [48].

Suppose *A* is an *n* × *m* data matrix where each column represents a variable and each row represents a sample. The matrix can be decomposed into the sum of the outer products of *m* vectors, which is shown in the equation below.

$$A = \mathbf{t}\_1 \mathbf{p}\_1^T + \mathbf{t}\_2 \mathbf{p}\_2^T + \dots + \mathbf{t}\_i \mathbf{p}\_i^T + \dots + \mathbf{t}\_m \mathbf{p}\_m^T \tag{1}$$

where *<sup>t</sup><sup>i</sup>* <sup>∈</sup> <sup>R</sup>*<sup>n</sup>* is defined as the column vector consisting of *<sup>n</sup>* observations of the *<sup>i</sup>*-th principal component *ti*, which is called the score vector, *<sup>i</sup>* <sup>=</sup> 1, 2, ··· , *<sup>m</sup>*. *<sup>p</sup><sup>i</sup>* <sup>∈</sup> <sup>R</sup>*<sup>m</sup>* is called the load vector. Equation (1) can be further written in matrix form.

$$A = \mathbf{T} \mathbf{P}^{\mathsf{T}} \tag{2}$$

where *<sup>T</sup>* = [*t*1,*t*2, ··· ,*ti*, ··· ,*tm*] <sup>∈</sup> <sup>R</sup>*n*×*<sup>m</sup>* is called the score matrix, and *P* = *p*1, *p*2, ··· , *p<sup>i</sup>* , ··· , *p<sup>m</sup>* <sup>∈</sup> <sup>R</sup>*m*×*<sup>m</sup>* is called the load matrix.

If the score vectors are orthogonal to each other, then for any *i* and *j*, when *i j*, *t*<sup>T</sup> *<sup>i</sup> t<sup>j</sup>* = 0 is satisfied. The load vectors are also orthogonal to each other, and the length of each load vector is 1. This is shown in the formulas below.

$$p\_i^T p\_j = 0i \neq j \tag{3}$$

$$p\_i^T p\_j = 1i = j \tag{4}$$

Multiply both sides of Equation (1) by *p<sup>i</sup>* to get the following equation.

$$A p\_i = t\_1 p\_1^\mathrm{T} p\_i + t\_2 p\_2^\mathrm{T} p\_i + \dots + t\_i p\_i^\mathrm{T} p\_i + \dots + t\_m p\_m^\mathrm{T} p\_i \tag{5}$$

Substitute Equations (3) and (4) into Equation (5) to get the equation shown below.

$$t\_i = A\boldsymbol{p}\_i \tag{6}$$

As can be seen from Equation (6), each score vector is actually a projection of the data matrix in the direction of the corresponding load vector. The length of the vector *t<sup>i</sup>* reflects the degree of coverage of the data matrix *A* in the *p<sup>i</sup>* direction. The greater the length is, the greater the degree of coverage is. The score vectors are arranged from largest to smallest according to their length.

$$\|\|\mathbf{t}\_1\|\| > \|\|\mathbf{t}\_2\|\| > \dots > \|\|\mathbf{t}\_m\|\|\tag{7}$$

Then the load vector *p*<sup>1</sup> represents the direction in which the data matrix *A* changes the most. *p*<sup>2</sup> is perpendicular to *p*<sup>1</sup> and represents the direction in which the data matrix *A* change is the second largest and *p<sup>m</sup>* represents the direction in which the data matrix *A* changes the least.

Furthermore, through the principal component decomposition, the data matrix *A* can be transformed into the equation below.

$$A = \mathbf{t}\_1 \mathbf{p}\_1^T + \mathbf{t}\_2 \mathbf{p}\_2^T + \dots + \mathbf{t}\_k \mathbf{p}\_k^T + E \tag{8}$$

where *E* is the error matrix, representing the change of *A* on load vectors from *pk*+<sup>1</sup> to *pm*. In a practical application, the error matrix *E* can be ignored since *k* is much smaller than *m*, and the error matrix *E* is mainly caused by measurement noise. Therefore, the data matrix *A* can be approximately expressed as the following equation.

$$A \simeq \mathbf{t}\_1 \mathbf{p}\_1^T + \mathbf{t}\_2 \mathbf{p}\_2^T + \dots + \mathbf{t}\_k \mathbf{p}\_k^T \tag{9}$$

Thereby, the original dimension of the data matrix *A* can be reduced to the *k* dimension. In the process of PCA dimensionality reduction, eigenvalues and orthogonal normalized eigenvectors need to be solved. Principal components can be calculated by the Singular Value Decomposition (SVD) of a matrix.
