1. Introduction
In terms of practical application, fault diagnosis is primarily used in industrial failure. Among such incidents, 80% of industrial failures come from analog circuits. Therefore, analog circuit fault diagnosis is the research focus of industrial fault diagnosis. Compared with digital circuits, analog circuits are more complicated and have more stringent requirements for fault diagnosis. Training using analog circuits for diagnostic models in fault diagnosis is more valuable. With the continuous refinement of various data-analysis methods, the troubleshooting technology for analog circuits is also improving; common troubleshooting techniques, including PCA [
1], Search Grid [
2], particle swarm optimization (PSO) [
3], ant colony algorithm (ACA) [
4], simulated annealing (SA) [
5], genetic algorithm (GA) [
6], Back Propagation Neural Network (BP) [
6], Self-organizing Maps (SOM) [
7], Extreme Learning Machine (ELM) [
8], decision tree [
9], random forest [
10] and SVM [
11] all have good classification results to some extent [
12].
For the fault diagnosis of analog circuits [
13], the following operations are required. Firstly, an analog circuit model is built, the fault is set in the analog circuit and the failure data is collected. Secondly, the fault data is processed, using methods such as feature-value extraction, dimensionality reduction, linearization, etc. Finally, the processed data is used in the modelling of fault diagnosis to judge the performance of the model [
14]. In this article, the data of Sallen–Key circuit is used for model comparison and the more complex CSTV circuit is used for validation to determine the versatility of the model. Here, the data of both analog circuits are used for horizontal and vertical comparison experiments of the algorithm in order to obtain more practical conclusions. When there are fewer fault samples, the auxiliary condition, namely the error penalty item, is added to the objective function of the SCA-SVM [
15] classifier to construct a new fault diagnosis model, which is TLSCA-SVM. The model not only has the advantages of the SCA-SVM fault diagnosis model, i.e., the fault classification speed is fast, the accuracy is higher, but also the transfer-learning [
16] ability is better.
As a new concept of machine learning [
17], transfer learning brings new research directions to machine learning. This paper uses the data from different faults in the analog circuit for fault diagnosis [
18], performs a series of data processing, combines the SCA optimization method with SVM to obtain the classifier SCA-SVM and adds an error penalty item to build a new fault diagnosis model, namely the TLSCA- SVM model. This method imports the data processed by the wavelet packet and PCA into the improved SCA-SVM classifier for training and prediction, thereby improving the speed and accuracy of diagnosis [
19].
The controversy about the SCA algorithm [
20] concerns the local search of the optimization algorithm. It is not difficult to understand that the optimal solution can be guaranteed using global search [
21]. However, it is inevitable that the local optimal solution may appear in the local search, performance at a certain point is best, and the farther away from the point, the worse the performance. In response to this debate, this paper effectively avoids the occurrence of the problem. In this paper, the parameters of SVM are optimized by using the same characteristics of local search and global search probability of the SCA algorithm in optimizing parameters. The SCA-SVM classifier and a good classification effect are obtained.
As machine learning continues to innovate, the concept of transfer learning has been introduced through research. Traditional machine learning uses the continuous autonomous acquisition of knowledge from the data. This mechanism requires a lot of data and iterations to achieve data-driven effects. In machine learning, when a trained model is applied to a new field, the effect is often poor. Transfer learning, as an important branch of machine learning, focuses on applying knowledge that has already been learned to new problems, so that the knowledge transfer can allow trained models to be applied to new areas. Transfer learning includes zero-shot learning [
22], one-shot learning [
23] and few-shot learning [
24]. Since the zero-shot learning is too ideal to exist in actual data processing, the limit of few-shot learning is zero-shot learning. Therefore, in this paper about analog circuits fault diagnosis, few-shot learning is used for fault diagnosis [
25], as well as transfer learning, in order to achieve the effect of fault prediction [
26].
Transfer learning refers to the transfer of knowledge from one field to another, and it is common to apply the experience gained from learning to new learning, which is the influence of one learning state on another. The earliest application of transfer learning was in educational psychology, which put forward that the implementation of transfer learning must combine old with new knowledge. In terms of machine learning, transfer-learning development has a better development prospect. Since 1995, NIPS launched a “Learning to Learn” professional seminar in the field of machine learning [
27], and since then transfer-learning research has entered a period of rapid development. Transfer learning is to apply the knowledge extracted from the source task [
28] to the target task. Compared with multitasking, the most important thing in transfer learning is the target task [
29], rather than extracting knowledge of all source tasks at the same time. Of course, in educational psychology, the implementation object of transfer learning is human, while the implementation object in machine learning is the corresponding algorithm model. In this regard, humans have excellent transfer-learning capabilities. For example, in human childhood, when young children are provided with oral descriptions by their parents, the children can find the corresponding objects. Even in the learning process, as long as the solution is mastered, similar problems can be solved. However, the application of transfer learning in machine learning is not particularly easy. This is because, unlike humans, machines cannot achieve the so-called independent thinking and can only achieve the desired results through continuous data optimization. It is worth mentioning that in the face of massive data analysis, the introduction of migration learning can reduce a lot of the unnecessary workload, which has certain research significance.
At present, there are many scholars focused on transfer learning research and most people research the algorithm related to transfer learning. They use different technologies to achieve the requirement of transfer learning. Many transfer-learning ideas are applied to image recognition [
30], such as image features being extracted to a training set, the knowledge of the source task is transferred to the target task, and finally the image recognition in the test set can be realized.
This article uses transfer learning for the purpose of fault diagnosis. Obtaining fault data in industrial fault diagnosis requires a lot of manpower and material resources. The fault of actual industrial equipment is a kind of damage. Therefore, the application of fault-diagnosis methods in transfer learning is typical. The use of normal data and a some fault data can effectively diagnose faults and achieve predictive effects. This method can maintain equipment safety and ensure the normal operation of actual industrial equipment. Considering the availability of fault data, this paper uses the data generated by analog circuits from different faults for analysis. An error penalty is added to the SCA-SVM classifier to build an innovative fault-diagnosis model, which is TLSCA-SVM. By comparing the experiments, it is concluded that the model can diagnose the fault with less fault data, achieve the effect of transfer learning, and finally realize the prediction of failure.
2. SCA-SVM Algorithm
The SCA optimization parameter algorithm is a novel random optimization method based on population. The SCA was proposed by Australian scholar Mirjalili in 2016 [
31], the essence of which is to optimize the parameter by using random probability. SCA creates randomly generating multiple initial candidate solutions, and uses sine and cosine functions to make these initial candidate solutions have the same probability of moving either in the direction of the optimal solution or reverse. This method not only guarantees the accuracy of global search optimization parameters, but also ensures the speed of local search optimization parameters, so that the optimal parameters can be found quickly and accurately in the model. Because the essence of the SCA optimization algorithm is the population optimization algorithm, it also meets the general law of the two stages of the population optimization algorithm, which is exploration and utilization. When the algorithm is in the exploration stage, the algorithm randomly searches with a large probability gradient to ensure sufficient search space. When the algorithm is in the utilization stage, the random probability gradient decreases gradually, ensuring that the optimal solution can be found accurately. In SCA [
32], the optimization process is divided into global search and local search. These two parts promote and restrain each other, as global search is used to quickly locate the optimal solution of the range, and local search is used to find the optimal solution. These two parts reach a dynamic balance can find the global optimal solution. If there is only a global search, the optimization speed is slow and the equipment configuration is high. If there is only a local search, it is easy to obtain a local optimal solution. The SCA optimization parameter algorithm is determined by the different value situation of the sine function global search and local search.
The two important parameters of SVM are kernel function [
33] and the penalty factor [
34]. These two parameters directly affect the performance of SVM [
35], so finding the optimal parameters of SVM becomes the core problem of constructing a classification model. How to find the optimal parameters quickly and effectively becomes the key to the optimization algorithm. This article introduces SCA to optimize the SVM parameters. SCA randomly creates multiple initial candidate solutions and uses sine and cosine functions [
36] to determine the search method. This method ensures that the initial candidate solution moves toward or away from the optimal solution with the same probability. This way of finding the optimal solution not only ensures the accuracy of the global search optimization parameters, but also ensures the speed of the local search optimization parameters, so that the optimal parameters can be found quickly and accurately in the model [
37].
2.1. Principles of SCA Optimization Parameters
In this essay, the SCA optimization parameter algorithm can be used to optimize two important parameters in SVM. The essence of the SCA algorithm is to use the unique characteristics of the trigonometric function to make the probability of each optimization the same, so that the optimization effect meets the accuracy of the global search and the speed of the local search. The specific operation is that the value of the sine and cosine function is used to determine whether to perform a global search. For example, when the distance between the value of the sine and cosine function and the abscissa exceeds 0.5, that is the function value is distributed between 0.5 and 1 or −0.5 and −1, the optimization of the global search method is achieved. When the distance from the sine and cosine function value to the horizontal coordinates is less than 0.5, which is the function value is distributed between −0.5 and 0.5, the SCA algorithm conducts a local search. SCA randomly creates multiple initial-candidate solutions and uses sine and cosine functions to move these initial candidates in the same probability of moving or reversing in the direction of the best solution, which not only ensures the accuracy of the global-search optimization parameters, but also ensures the speed of the local search [
11].
In the SCA optimization algorithm, the trigonometric function determines the position of the next iteration point and is iterate according to the following formula:
In the equation,
is the position of the current iteration,
and
are random numbers of random values in iteration that decide the trigonometric function,
is the best selected scenario. In the actual optimization process, the above positions are selected appropriately and the equation can be updated with the sine cosine function:
In the equation, is a number randomly selected each time in the range of 0 and 1, ensuring that the probability is the same when optimizing.
In the above formula, according to parameter
, the algorithm can determine the direction of the next iteration. The operation occurs when the distance from the parameter value to the abscissa exceeds 0.5, that is, the value is distributed between 0.5 and 1 or between −0.5 and −1, in the outer ring area shown in
Figure 1. When the distance between the parameter value and the abscissa is less than 0.5, that is, the function value is distributed between −0.5 and 0.5, the search is performed in the circle area shown in
Figure 1. The parameter is a number randomly selected between 0 and
each time. Parameter
is used to determine the random allocation of the enhanced or weakened relationship, and a size comparison with 1 has been used to determine whether the relationship is enhanced or weakened. As can be seen from the Formula (3), the parameter value of
can select the type of the sine and cosine function, when
, the equation containing only the sine function is selected for the next iteration. When
, the equation containing only the cosine function is selected for the next iteration. Through the above iteration, it can be guaranteed that the algorithm has the same probability in global search and local search, so that the algorithm itself can combine the benefits of both search modes itself to achieve better results.
In order to satisfy the global search and the local search with the same probability conditions, and make the optimization solution converge, the SCA optimization algorithm self-adjusts using Formula (4):
2.2. The Classification Principle of SVM
SVM was originally used to deal with dichotomous problems. In order to meet more classification problems, slack variables and maximum separation hyperplane theory were continuously introduced and applied to classify nonlinear problems. Due to the special theoretical concept, the SVM classifier effect can be applied to transfer learning, so this paper used the SVM for fault diagnosis based on transfer learning.
The most important problem of SVM is the mapping of kernel functions. It is a nonlinear indistinguishable problem. It is classified by hyperplane and mapped to a simple linear problem, where the largest hyperplane satisfies the following conditions:
is the general vector of the maximum hyperplane, and
is the offset of the maximum hyperplane, parameter
(
> 0) determines the boundaries.
The penalty factor
is applied to alter the proportion of relaxation variable
, and Formula (5) is altered to:
By introducing Lagrange multiplier
, the maximum hyperplane problem as a pair problem is reflected as follows:
Solving the upper class, the decision function of the SVM is shown below:
In this paper, the important parameters penalty factor and kernel parameter in SVM were perfected by SCA optimization parameters, and a fault classifier with excellent classification performance is obtained.
3. Fault Diagnosis Model of TLSCA-SVM Algorithm
3.1. Principles of Transfer Learning
Transfer learning is the novel branch of machine learning, which has a critical influence on the data processing of artificial intelligence. In 1995, the NIPS professional seminar “Learning to Learn” was launched in the field of machine learning 26. Since then, transfer-learning research has entered a period of rapid development. The main principle of transfer learning is to apply the knowledge of the source task to the target task. Compared with multi-task learning, the most important thing in transfer learning is the target task, rather than learning the source task that applies to the target task.
Transfer learning mainly includes the following two concepts: domain and task . The two parts of are composed of two parts: the feature space and the edge probability distribution , where , for a given domain, also consists of two parts.
is given a labeled source domain data, is the corresponding source task. is a target domain with very few tags, is the corresponding learning task. The purpose is to transfer the knowledge of and to , so as to perfect the performance of the target function ().
3.2. TLSCA-SVM Algorithm
The SCA optimization algorithm uses the equality constraints to optimize the parameters of the SVM classifier. This method transforms the solution of the optimization problem into the solution of a set of linear equations. For a set of input samples
. SVM maps the nonlinear inseparable problem to a high-dimensional space into a linearly separable problem through kernel function mapping, where the function discriminant is:
To propose the migration algorithm of the auxiliary data set, the auxiliary data sets are similar to the target set to enhance the accuracy of the classification model. Given that
is the target sample data,
is similar sample data. On the basis of the original SCA-SVM optimization problem, the TLSCA-SVM method adds auxiliary data that is an error penalty term, to the target formula to realize the transfer of knowledge. The improved objective function can be expressed as:
In the formula, and are the penalty parameters of the target domain and source domain data sets, respectively, is the prediction error.
When the Lagrange multiplier is introduced, the dual equation of Equation (10) is derived, multiply each equation constraint with the Lagrange multiplier
, Then, the Lagrange function is established by adding the objective function:
The necessary condition for taking the extreme value is to find the partial derivative of each variable and set it to zero to obtain the Formula (13).
Eliminate variables
and
to get the matrix equation:
In the formula, , is the kernel function. and are dimensional column vectors, .
The expression of TLSCA-SVM decision function is obtained by Formula (14).
The Gaussian kernel function has an excellent anti-interference ability to the noise of the data. In this paper, the Gaussian kernel is used as the kernel function of TLSCA-SVM. The expression is:
where
is the parameter of the Gaussian kernel width.
Finally, combined with the previous SCA-SVM algorithm, the decision function of the TLSCA-SVM classifier is obtained by the improved algorithm.
3.3. Fault Diagnosis Process of TLSCA-SVM Algorithm
In this paper, the TLSCA-SVM algorithm is applied to an analog circuit fault analysis, which is accomplished via the following main four steps (shown from step 1–step 4): building analog circuit model, setting and collecting fault data, using wavelets for data feature extraction, using PCA dimension reduction, using TLSCA-SVM classifier for troubleshooting, and comparing the actual fault data with the fault diagnosis results to arrive at the fault diagnosis rate.
The programming in this article is based on the mutual control of multiple modules. The data processing uses a five-layer wavelet packet and PCA technology. The classifier part uses SCA to perfect the parameters in the SVM to achieve fast and accurate classification results. When the classification algorithm adds auxiliary conditions, the algorithm has the ability of transfer learning. The specific fault handling process is shown in
Figure 2.
In this paper, the steps of fault diagnosis are shown below. The entire diagnostic process is not changed when comparing algorithms, but the comparison is made with the classifier and the optimized parameters of the classifier. This ensures that when comparing algorithms, the external conditions are consistent, and only the diagnostic effect of the classifier itself is considered. In order to ensure the consistency of the external conditions of the comparison algorithm, the same failure training set data is the same when using different classifiers for troubleshooting.
Step 1. The circuit model is established to collect and process data from different faults in the analog circuit. Input variables are obtained and the input data undergoes a series of data processing steps such as wavelet packet extraction, feature-value processing and PCA dimensionality reduction.
Step 2. After the above data processing is completed, the data is put into the pre-built classification model to achieve the effect of classifying the fault data. In this paper, the function of SVM is realized by LIBSVM. The penalty factor and kernel parameter in SVM are perfected by the SCA optimization method. In the SCA optimization method, multiple initial candidate solutions are randomly created. These initial candidates are used with the same probability of moving or reversing in the direction of the best solution. When looking for a relatively good solution, this method chooses to keep and proceed to the next iteration.
Step 3. An auxiliary condition is added to the objective function of the already debugged SCA-SVM classifier that is an error penalty term. An innovative fault diagnosis model, namely TLSCA-SVM, is constructed. The knowledge of the source domain is transferred to train the target domain, and the TLSCA-SVM classifier performs fault diagnosis. Fewer fault data is used as a training set and put into the classifier for fault diagnosis
Step 4. The trained TLSCA-SVM classifier model is used to classify faults on the test set, output the diagnosis results and judge the accuracy of the fault diagnosis results.
In this article, it is worth mentioning that the problem of a single-fault diagnosis in analog circuits is considered. Single fault refers to changing the parameters of only one component in the circuit while the parameters of other components in the circuit remain unchanged. The situation in which multiple component parameters are changed at the same time is referred to as a multiple-fault diagnosis problem. The data processing method of multi-fault diagnosis is similar to that of single-fault diagnosis. Multiple faults are changing the parameters of only two or more components while the parameters of other circuit components remain unchanged. Single fault is the basis of fault diagnosis in analog circuit fault diagnoses. In the actual analog circuit fault, the occurrence probability of a single fault is more than 80%, and the occurrence probability of multiple faults is relatively low. Additionally, a multi-fault diagnosis can be regarded as multiple single-fault problems occurring at the same time. That is, a multi-fault problem can be decomposed into multiple single-fault problems. Single-fault diagnosis and multi-fault diagnosis have many similarities in analog circuit fault diagnosis and multi-fault problems with less probability can be decomposed into single-fault for processing. Considering all of these aspects, this paper does not analyze the multi-fault diagnosis.
4. Acquisition and Process Fault Samples of Analog Circuits
This article takes the Sallen–Key circuit [
38] and CSTV circuit [
39] as diagnostic examples. Sallen–Key circuits and CSTV circuits are typical circuits that are often used to analog-circuit fault diagnosis. In the simulation experiment, considering the selectivity of the simulated circuit data, for more rigorous consideration, the faults of the Sallen–Key circuit and the CSTV circuit were set according to the literature [
40,
41,
42].
At the same time, the Sallen–Key circuit is used because it is relatively simple as a second-order circuit and can use a public dataset. The Sallen–Key circuit is widely used in the fault diagnosis of analog circuits, so the Sallen–Key circuit was used to verify the validity of the algorithm. The CSTV circuit is a fourth-order circuit, which is more complex than the Sallen–Key circuit. The application of this circuit can show that the algorithm itself has universal applicability
4.1. Data Processing of Sallen-Key Band-Pass Filter Circuit with Injected Fault
In this essay, the data were studied by extracting the data in different modes of the Sallen–Key circuit. Circuit structure, individual component types and values are shown in
Figure 3. Here, the parameter value is set to deviate from the original value by 50%, the capacity is set to deviate from the original value by 10%, and the resistance is set to deviate from the original value by 5%.
The excitation-signal parameters of the analog circuit are shown in
Figure 3. After sensitivity analysis, R2, R3, C1, and C2 are selected as faulty components. The fault type and parameter list are shown in
Table 1. The data of Sallen−Key circuit contains eight failure modes, which includes R2+, R2−, R3+, R3−, C1+, C1−, C2+ and C2−. The symbols − and + denote additional or little data and normal state (no failure, NF).
In this paper, the data of the analog circuit output signal waveform is collected, and the nine modes are set, which are the signal data of the normal state and the signal data of the eight fault states. Each mode collects 100 sample points, so there are a total of 900 state sample points, which are randomly selected. A total of 600 sample points were used as the training set, and the remaining 300 sample points were used as the test set.
4.2. Data Processing of the CSTV Filter Circuit Injected into the Fault
To reflect the generality of the SCA-SVM classifier to fault diagnosis, a relatively complex circuit, CSTV filter, was selected as the verification analog circuit. The circuit element name and value are shown in
Figure 4, and the status type is shown in
Table 2. A total of 1800 samples, 900 of which were collected as the training set, and the other 900 samples were used as the test set.
The specific operation is to compare the classification algorithm and optimize the parameter method. The common second-order filter Sallen-Key circuit is applied for comparison experiments. To verify the versatility of the comparison algorithm conclusions, the multi-stage filter, which is a CSTV filter, is used. As verification, a more general result can be obtained.
Because the experimental simulation environment and experimental data are different, the experimental results will also be different. Therefore, when simulating the soft fault of the circuit, a more general experimental environment and data set were selected. According to the above parameter settings, each of the nine modes are analyzed using 100 MC, the start frequency is set to 100 Hz, and the cutoff frequency is set to 10k Hz. A voltage probe is set at the output node of the circuit to collect the output voltage. This ensures that the experimental results studied in this paper are more extensive and reliable.
4.3. Feature Processing of Analog Circuit Fault Data
In this paper, the processing of analog circuit fault data is divided into the three steps, which includes the extraction of fault signals, wavelet packet extraction of data features and PCA dimensionality reduction.
4.3.1. Extract Fault Signal
Multiple fault control groups were established, the original data set was simulated, a voltage probe was set on the output node of the circuit, the output was processed by Monte Carlo and the output voltage was collected. The output voltage of each mode was used as the original data set. To visually compare the characteristics of circuit failures in different modes, the simulation results of the above two circuits were selected, and the signal data of the eight failure modes and the normal mode were displayed.
Figure 5 and
Figure 6 present the results of circuit simulation of different fault modes in the Sallen–Key circuit and CSTV circuit. As can be seen from the figure, there are differences in the output signal of the circuit in different fault modes.
Figure 5 and
Figure 6 reflect the states of different faulty circuit signals. Since the analog circuit fault data is set by changing the values of different elements of the circuit. Therefore, the circuit signals corresponding to each fault have a certain similarity. In each subplot in
Figure 5 and
Figure 6, each image is similar, but each subplot is slightly different due to the different values of the components.
4.3.2. Feature Extraction and Dimensionality Reduction of Fault Signals
In this paper, the wavelet packet was used for feature extraction. The feature extraction is representative data extracted from a large amount of data, so the original signal is similar but not identical to the signal after feature extraction. In
Figure 7, it can be seen that
Figure 7a is similar but not the same as
Figure 7d. The main function in the step of extracting features is to obtain highly representative data in the overall data, that is to remove redundant data. In
Figure 7, it can be seen that
Figure 7c contrasts
Figure 7b with the redundant signal data removed.
After the analog circuit is built, the relevant parameters are set. All failure modes are analyzed by 100 MC, the start frequency is set to 100 Hz, and the cutoff frequency is set to 10 kHz. At the output node of the circuit, a voltage probe is used to collect the output voltage. The output voltage is used as the original data set, the wavelet packet is used to extract the features, and the PCA is used to reduce the dimensionality. Finally, the feature-processed analog circuit fault data is obtained, which can be used in the fault diagnosis classifier.
When the fault signal is extracted, the output voltage in each mode is obtained as the original data set, and the features are extracted by wavelet packet, and then the PCA dimension reduction is performed. Finally, the fault data of the analog circuit after feature processing is obtained, which is used in the fault-diagnosis classifier. Among them, the wavelet packet is mainly used to extract the eigenvalues, and the wavelet packet comes from the wavelet, which is a more perfect decomposition method based on the wavelet decomposition. Compared with the wavelet transformation, the wavelet packet transformation can analyze the signal in all directions, can further decompose the wavelet transform without subdividing, and evenly distribute the corresponding two frequency bands according to the same spacing in the high frequency part in the range. This article uses a five-layer wavelet packet, which can effectively decompose the input signal to extract the characteristic value.
Figure 7 shows the changes before and after the signal when the wavelet packet is used in this article to extract the characteristic value.
Regarding the PCA technique, the main purpose of this essay is to use PCA to perform dimensionality reduction operations. The number of selected Principal components is five, as shown in
Figure 8, which reflects the changes in the data before and after the PCA analysis. When using PCA to reduce the dimensionality, the threshold is set to 85%. When the cumulative contribution rate reaches 85%, the pivot is no longer selected. After all the original variables are processed by dimensionality reduction, the pivot can be obtained. The pivot is the linearity of the original variable combination.