The purpose of FL-based fault diagnosis is to federate multiple clients to train a powerful global model, but the information uploaded to the server contains both useful and useless information when the local clients have different data conditions and model performance. The traditional FedAvg method ignores the differences in client information and assigns the same weight to different clients, which inevitably affects the fault diagnosis effect of all clients participating in the federation. Therefore, this section designs a multiscale recursive attention gate federation fault diagnosis method, which uses attention gates to give more attention to the useful information uploaded by the client, thus improving the accuracy of the client’s multiple working conditions fault diagnosis.
3.1. Multiscale Recursive Attention Gate Federation Method
Neural networks abstract features to higher scales through layer-by-layer feature representation, where features at different layers represent features at different scales. The feature representation of the same set of signals at different layers is different, and the distinguishability of features is also different.
On the other hand, it is difficult to obtain multiple working conditions data from local clients, so it is especially important to develop a multiple working conditions fault diagnosis method jointly with multiple clients. The simplest way to jointly develop multi-client fault diagnosis models is to share data, but sharing data involves the privacy information of clients. Therefore, the FL approach is that the clients locally do preliminary feature mining on private data first and then share the mined information, thus enabling multiple clients to jointly develop fault diagnosis models for multiple operating conditions. However, existing FL approaches do not take into account the differences in information uploaded by clients due to differences in local data and model performance. Therefore, a multiscale recursive attentional gate FL model (MAGFL) is designed in this section to improve the accuracy of multicondition fault diagnosis for the federation’s later client. The algorithm steps are as follows:
Remark 2. Shallow scale information is more comprehensive but coarser and less distinguishable. Deep-scale features are more accurate, but there is information loss. Therefore, the comprehensive use of multiscale features can improve the fault diagnosis accuracy of rolling bearings.
- Step 1:
Designing a multiscale recursive FL framework between clients.
The traditional FL method gives the same weight to the information uploaded by the client, which leads to the waste of useful information and thus affects the effectiveness of FL-based fault diagnosis. Therefore, a multiscale recursive FL framework among clients is designed as shown in
Figure 2, which enables the FL model to focus more on the useful information provided by the clients.
For multiple clients , denotes the local data of , denotes the local data of , and denotes the local data of . In fact, each client may run in multiple working conditions, and each client may not run in exactly the same working condition. The server initializes the global model parameters and sends them down to the client, where denotes the network parameters of the model and n denotes the number of network layers. denotes the weight and bias of the j-th layer. starts local training using the model parameters inherited from the server as initial values.
uses AutoEncoder (AE) for layer-by-layer feature extraction. Firstly, the first scale feature
is extracted by the first AE, and then the first scale feature
of each client is uploaded to the server for the first scale feature federation to obtain the server aggregated feature
as shown in Equations (1) and (2).
where
denotes the operator function of the attention gate,
denotes the splicing feature of each client’s uploaded feature, splicing features means splicing of features by rows, from multiple features to one feature,
, and
denotes the aggregation parameters of the server’s attention gate.
denotes the Sigmoid activation function of the neural network, and
is the weight assignment mechanism of the attention gate, obtained through a single layer of the neural network. ⊗ denotes the multiplication of the elements in the corresponding positions in the tensor. By using attention gates, it is possible to make the server’s aggregation more focused on the useful information in the features provided by each client, rather than giving equal weight to information from different clients. Attention gates achieve more attention to useful information by assigning different weights to the corresponding neurons. If the output of the attention gate is larger, it means that the corresponding neuron corresponding to the information is given a larger weight. The usefulness of the information depends on the contribution of the information output from the neuron to the fault diagnosis result, whether it is positive or negative.
Then,
is distributed to each client for local multiscale feature fusion to obtain the client fusion feature
as shown in Equation (
3).
where
denotes the client’s local multiscale feature fusion strategy, which will be described in detail in step 2. Then, unsupervised feature extraction is performed on
using AE to obtain the second-scale feature
. The
is uploaded to the server for second-scale federal aggregation to obtain
as shown in Equation (
4).
is then distributed to each client for local multiscale feature fusion. In such a way, the aggregated features
at the
n-th scale are obtained, and
is sent down to the client for local multiscale feature fusion to obtain the fused features
, as shown in Equation (
5).
is then fed to the client’s Softmax classifier for local multiple working condition fault diagnosis.
- Step 2:
Multiscale recursive fusion within the client.
The local layer-by-layer recursive use of global features provided by the server can make the global features better serve the current client. The information flow relationship of the proposed method is shown in
Figure 3.
Remark 3. The features are essentially the outputs of the neurons in the hidden layer of the neural network. The mapping of information from data space to feature space is actually the transformation of the neural network from input to output. The reason is that the distinguishability of fault information in data space is not strong, and the purpose of transforming it to feature space by the neural network is to train the parameters of the neural network to make the distinguishability of fault information in feature space stronger.
Remark 4. The client uses private data to train a local neural network model without requiring the sequential order of time series. In the process of FL, if the working conditions of each client are the same, then even the traditional FL method can obtain good fault diagnosis results. However, when it faces the problem of multiple working conditions, the fault diagnosis effect of the traditional FL method is not guaranteed. This paper focuses on fusing multiscale features of the client using a multiscale recursive federation approach, with attention gates used to focus on information useful to the client, thus solving the problem of multiple working conditions.
For the
t-th round of federation, the features
at the first scale are first extracted locally at the client using AE, and then
is uploaded to the server for aggregation to obtain
. The server sends
down to the client locally for multiscale recursive fusion.
use the local multiscale recursive fusion of the inherited
to obtain the fused features
. The fusion strategy uses the attention gate approach as shown in Equations (6) and (7).
By local multiscale recursive fusion of and , the aggregated features can be better served for local client feature extraction, and the information from other clients can be used to optimize the effectiveness of multiple working conditions fault diagnosis for the current client.
Then,
is mapped to a higher feature scale by AE to obtain
. The
is uploaded to the server for federal aggregation at the second scale to obtain the aggregated feature
.
contains useful information about all clients participating in the federation, which can be used to optimize the client’s local client feature extraction, so the client performs local recursive feature fusion of the inherited server features
as shown in Equation (
8).
In such a way, the client locally performs n times of multiscale recursive fusion to obtain the top-level fused features .
contains the features of all clients participating in the federation, and local multiscale recursive fusion can make the multiscale recursive federation work better so that the information from other clients can be used to improve the accuracy of local multiple working condition fault diagnosis.
In fact, different types of failures may occur when the client is working under different working conditions. The role of the attention gate is to selectively utilize the multiple working conditions information according to the local needs of the client. When different clients do not learn the same fault type information, the local client’s attention gate selects the information that is useful for their own fault diagnosis. The performance of the models differs from client to client when the length of the training set varies, which requires more attention to useful information, so attention gates are crucial to ensure the effectiveness of federal learning.