**3. Short Text Data Model of Secondary Equipment Faults in Power Systems Based on LDA Topic Model and Convolutional Neural Network**

#### *3.1. Improved Text Classification Process*

The quality of text representation directly affects the effect of final classification. Transforming Chinese language into a structured language that can be recognized by the computer is the process of feature extraction and semantic abstraction of Chinese text. The traditional LDA model uses the external corpus or the method of merging short texts to improve the semantic information between words, but the word vector captured by the topic model is the word bag model, that is, the two phrases "a before B" and "B before a" are characterized as the same word vector after extracting the features from the topic model. However, most of the original data in this paper were based on the manual records of operation and maintenance personnel, and it is difficult for different people to form a standardized recording method. In the face of short text feature mining with poor context

dependence such as fault data, the classification result obtained by directly using LDA model is poor. dependence such as fault data, the classification result obtained by directly using LDA model is poor.

standardized recording method. In the face of short text feature mining with poor context

*Energies* **2022**, *15*, 2400 5 of 16

In this paper, the RLDA model was used to extract global features to construct the subject word vector, and the word2vec model was used to mine the potential feature vector extracted by local features. The two features were combined to absorb their respective advantages as the input of the convolutional neural network. In this paper, the RLDA model was used to extract global features to construct the subject word vector, and the word2vec model was used to mine the potential feature vector extracted by local features. The two features were combined to absorb their respective advantages as the input of the convolutional neural network.

#### 3.1.1. Text Preprocessing 3.1.1. Text Preprocessing

Consulting the published work [28], the collected short text data can be labeled as serious, critical, and general defects for the secondary equipment. In a ratio of 7:2:1, the obtained text short messages could be defined as training set, verification set, and test set. The top 30 terms in terms of frequency without text pre-processing are shown in Figure 2. Consulting the published work [28], the collected short text data can be labeled as serious, critical, and general defects for the secondary equipment. In a ratio of 7:2:1, the obtained text short messages could be defined as training set, verification set, and test set. The top 30 terms in terms of frequency without text pre-processing are shown in Figure 2.

**Figure 2.** The top 30 words without text pre-processing. **Figure 2.** The top 30 words without text pre-processing.

Analyzing and summarizing the natural language characteristics of the defective text data, the secondary equipment defect text data cleaning was based on the following steps: Analyzing and summarizing the natural language characteristics of the defective text data, the secondary equipment defect text data cleaning was based on the following steps:


basic work of text mining in various professional fields. The quality and quantity of words included in its professional dictionary determine the accuracy of word segmentation and the part of speech tagging in text preprocessing. Due to the large number and miscellaneous types of electrical secondary equipment, the number of words related to the construction of this field is very huge, and there are thousands of words describing the equipment itself, such as the transformer station names, equipment protection proper terms, and so on.

#### 3.1.2. Text Classification Model by Using LDA

The LDA topic model features based on short text data from secondary equipment are explained as follows:


$$P(w\_i|d) = \frac{f\_d(w\_i)}{Len(d)}\tag{1}$$

where *f<sup>d</sup>* (*wi*) represents the frequency of the words in the document, and *Len*(*d*) stands for the length of the short text *d*.

Inspired by [26], the expectation of the topic distribution for document-generating words can be regarded as the distribution of document-generating topics:

$$P(z|d) = \sum\_{w\_i \in \mathcal{W}\_d} P(z|w\_i)P(w\_i|d) \tag{2}$$

where *P*(*z*|*d*), *W<sup>d</sup>* , and *P*(*z*|*wi*) are the probability of the text generating words, the short text set, and the probability of the word generating topics, respectively.

The LDA topic generation model was established. Then, we needed to implement the Gibbs sampling estimation based on the corresponding model parameters and give the number of iterations. Finally, the topic distribution matrix of any text in the corpus could be obtained after completing the model training.

#### 3.1.3. Improved LDA Topic Analysis Model Based on Relevance Formula

In this paper, the LDA topic model was improved by introducing a weighting coefficient *λ* in the topic correction layer to realize the model's potential topic extraction and topic correction function for secondary equipment fault text information. The proposed model is shown in Figure 3. The Relevance formula is as follows:

$$r(w, k | \lambda) = \lambda \cdot \log(\phi\_{k, w}) + (1 - \lambda) \cdot \frac{\phi\_{k, w}}{p\_w} \tag{3}$$

where *r*(*w*, *k*|*λ*) represents the degree of relevance of word *w* and topic *k* under the set weight coefficient *λ*. The value range of *λ* is (0 ≤ *λ* ≤ 1). *φk*,*<sup>w</sup>* is the probability distribution matrix of the words *w* under the topic *k*, and the marginal probability of the words under the topic-term matrix *φ*.

λ

the words under the topic-term matrix

. The value range of

**Figure 3.** RLDA model structure diagram. **Figure 3.** RLDA model structure diagram.

where *rwk* (, )

set weight coefficient

λ

From Equation (3), we can dynamically adjust the relationship between words and subjects by establishing weight coefficients. When the weight coefficient *λ* is close to 1, the more frequent the words appear in the word frequency, the higher contribution to the document theme, that is, the more frequent words in the default document are more relevant to the topic; when the weight coefficient *λ* is close to 0, the improved model indicates that the word appears more frequently in the selected topic, but less frequently in other topics; that is, the words and topics generally appear concomitantly. From Equation (3), we can dynamically adjust the relationship between words and subjects by establishing weight coefficients. When the weight coefficient *λ* is close to 1, the more frequent the words appear in the word frequency, the higher contribution to the document theme, that is, the more frequent words in the default document are more relevant to the topic; when the weight coefficient *λ* is close to 0, the improved model indicates that the word appears more frequently in the selected topic, but less frequently in other topics; that is, the words and topics generally appear concomitantly.

represents the degree of relevance of word *w* and topic *k* under the

 is (0 1) ≤ ≤ λ. φ

*k w*, is the probability

λ

distribution matrix of the words *w* under the topic *k*, and the marginal probability of

φ.

#### *3.2. Fusion of Word2vec Model and RLDA Model*

method is as follows:

*3.2. Fusion of Word2vec Model and RLDA Model*  In order to increase the interpretability of the text feature vector to the text representation, the improved LDA subject word model was proposed based on the Relevance formula to extract the global features to construct the subject word vector, and the latent feature vector extracted using the word2vec algorithm. By combining two features, the following new text feature representation is given by In order to increase the interpretability of the text feature vector to the text representation, the improved LDA subject word model was proposed based on the Relevance formula to extract the global features to construct the subject word vector, and the latent feature vector extracted using the word2vec algorithm. By combining two features, the following new text feature representation is given by

$$v\_m' = \begin{bmatrix} z\_{m'}^T \theta\_m^T \end{bmatrix}^T \tag{4}$$

where *mz* is the latent semantic vector representation of the document, θ *<sup>m</sup>* is the latent text-topic vector of the text extracted based on the improved topic model of the Relevance formula, *<sup>m</sup> <sup>v</sup>*′ is the combined semantic feature representation vector, and *T* is the where *z<sup>m</sup>* is the latent semantic vector representation of the document, *θ<sup>m</sup>* is the latent text-topic vector of the text extracted based on the improved topic model of the Relevance formula, *v* 0 *<sup>m</sup>* is the combined semantic feature representation vector, and *T* is the transpose operation on the matrix.

transpose operation on the matrix. The topic vector and the latent semantic vector are different in the dimension representation of the word vector. In order to eliminate the influence of the difference in magnitude generated by the fusion of the two vectors on the final classification result, this paper summarizes the two vectors *mz* and θ *<sup>m</sup>* . In a one-way combination, the processing The topic vector and the latent semantic vector are different in the dimension representation of the word vector. In order to eliminate the influence of the difference in magnitude generated by the fusion of the two vectors on the final classification result, this paper summarizes the two vectors *z<sup>m</sup>* and *θm*. In a one-way combination, the processing method is as follows:

$$v\_m = \left[\frac{z\_m^T}{||z\_m||}, \frac{\theta\_m^T}{||\theta\_m||}\right]^T \tag{5}$$

, *m m m m m <sup>z</sup> <sup>v</sup> <sup>z</sup>* <sup>=</sup> θ (5) The vectors combined by normalization not only regularize the length and eliminate The vectors combined by normalization not only regularize the length and eliminate the gap in magnitude between the two vectors, but also the new vectors generated by the fusion have both topical and potentially topical features.

the gap in magnitude between the two vectors, but also the new vectors generated by the fusion have both topical and potentially topical features. In the following, the text classification model is constructed based on convolutional neural network. By means of the convolutional neural network, a four-layer model was developed, which is shown in Figure 4.

In the following, the text classification model is constructed based on convolutional

**Figure 4.** Text convolutional neural network model. **Figure 4.** Text convolutional neural network model.

The detailed design are presented as following four parts: The detailed design are presented as following four parts:

(1) The first layer

(1) The first layer

obtained:

developed, which is shown in Figure 4.

The first layer could be defined as the input layer. In this layer, a length of text data was selected, and the vectorization of the text data was implemented with the help of step C. Employing the matrix *m n I* <sup>×</sup> ∈ as the input and defining the number of words as *m*, *m* represents the number of rows in the input layer. Similarly, we defined the dimension of the text vector as *n*, which can represent the columns of the input layer. Then, all word data could be divided into word vectors of equal dimensions, namely, the number of columns is the same in the input layer. The first layer could be defined as the input layer. In this layer, a length of text data was selected, and the vectorization of the text data was implemented with the help of step C. Employing the matrix *<sup>I</sup>* <sup>∈</sup> <sup>R</sup>*m*×*<sup>n</sup>* as the input and defining the number of words as *m*, *m* represents the number of rows in the input layer. Similarly, we defined the dimension of the text vector as *n*, which can represent the columns of the input layer. Then, all word data could be divided into word vectors of equal dimensions, namely, the number of columns is the same in the input layer. Accordingly, matrix *<sup>I</sup>* <sup>∈</sup> <sup>R</sup>*m*×*<sup>n</sup>* was constructed. During the training process, we employed the stochastic gradient descent method to adjust the word vector.

Accordingly, matrix *m n I* <sup>×</sup> ∈ was constructed. During the training process, we (2) The second layer

employed the stochastic gradient descent method to adjust the word vector. (2) The second layer The second layer was named as the convolution layer. Each scale includes two convolution kernels that have the scales of 3 × *n* , 4 × *n* , 5 × *n* Then, for the input matrix *m n I* <sup>×</sup> ∈ of the input layer, we needed to implement the convolution operation and acquire the matrix features of the input layer. The corresponding result vector The second layer was named as the convolution layer. Each scale includes two convolution kernels that have the scales of 3 × *n*, 4 × *n*, 5 × *n* Then, for the input matrix *<sup>I</sup>* <sup>∈</sup> <sup>R</sup>*m*×*<sup>n</sup>* of the input layer, we needed to implement the convolution operation and acquire the matrix features of the input layer. The corresponding result vector could be attained (*ci*(*i* = 1, 2, 3, 4, 5, 6)), which was input to the pooling layer for data compression. Meanwhile, the activation function ReLU was used to activate the convolution result. After each convolution operation, one convolution result will be obtained:

$$r\_i = \mathcal{W} \cdot I\_{ii+h-1} \tag{6}$$

convolution result. After each convolution operation, one convolution result will be *i ii h* : 1 *r WI* = ⋅ + − (6) where the size of *<sup>i</sup>* = 1, 2, · · · ,*<sup>s</sup>* − *<sup>h</sup>* + 1 and *<sup>I</sup>i*:*i*+*h*−<sup>1</sup> are the size of convolution kernel, which represents the number *i* of *h* × *n* matrix block from top to bottom when the matrix I is operated in sequence; "·" means that the elements at the corresponding positions of two matrix blocks are multiplied first and then added. Meanwhile, the activation function ReLU was used to activate the convolution result. Nonlinear processing was carried out for

each convolution result *r<sup>i</sup>* , and the result *c<sup>i</sup>* was obtained after each operation. The formula is as follows:

$$c\_i = \text{ReLU}(r\_i + b) \tag{7}$$

where *b* is the offset coefficient. Each such operation will produce a nonlinear result *c<sup>i</sup>* . Because *i* = 1, 2, · · · ,*s* − *h* + 1, after *s* − *h* + 1 convolution operations on the input matrix from top to bottom, we should arrange the results in order, and obtain the vector of the convolution layer *c* ∈ *R s*−*h*+1 , which is shown as:

$$\mathcal{L} = [c\_1, c\_2, \dots, c\_{s-h+1}] \tag{8}$$

(3) The third layer

We defined the third layer as the poling layer and employed the maximum pooling method for pooling. For the convolution result vector *c<sup>i</sup>* , the largest element was chosen as the feature value, which is defined as *pj*(*j* = 1, 2, 3, 4, 5, 6). Then, the value *<sup>p</sup><sup>j</sup>* was injected in succession into the vector *<sup>p</sup>* <sup>∈</sup> <sup>R</sup>6×<sup>1</sup> , which was input to the output layer of the next layer. Vector *p* stands for the global features of the text data, and it can reduce the dimensionality of the features and enhance the efficiency of classification.

(4) The fourth layer

Here, the output layer was utilized to name the fourth layer. We plugged the pooling layer completely into the output layer. In the pooling layer, we selected the vector *p* as an input, which was classified with the help of a SoftMax classifier. Then, the final classification result was output. The probability was computed using SoftMax classification, which is as follows:

$$L(p\_j) = \frac{e^{p\_j}}{\sum\_{j=1}^{6} e^{p\_j}}\tag{9}$$

where the formula (9) refers to the probability that belongs to the secondary device category.

The fault level was output for the secondary equipment. The traditional convolutional neural network used a single size convolution kernel to extract features. When faced with different document lengths, the classification results were not ideal. On the basis of the original convolution model, the deep convolution kernels of multiple sizes were utilized to mine text features in depth to enhance their ability to extract locally sensitive information, so that they can represent more feature information. To make a clear statement, the overall flow chart of the proposed model in this paper is shown in Figure 5.

**Figure 5.** The flow chart of the proposed text classification algorithm. **Figure 5.** The flow chart of the proposed text classification algorithm.

#### **4. Case Study**

#### **4. Case Study**  *4.1. RLDA Model Experiment*

*4.1. RLDA Model Experiment*  In order to compare the advantages and disadvantages of the LDA model and the improvement of the LDA model based on the Relevance formula in terms of prediction ability and generalization ability, this experiment used the theme consistency (coherence score) indicator. Generally, the larger the value, the stronger the predictive ability and generalization ability of the model, indicating that the model was more practical. According to the characteristics of the experimental data set, the main parameter values set by the text are shown in Table 2, and K represents the number of topics contained. In order to compare the advantages and disadvantages of the LDA model and the improvement of the LDA model based on the Relevance formula in terms of prediction ability and generalization ability, this experiment used the theme consistency (coherence score) indicator. Generally, the larger the value, the stronger the predictive ability and generalization ability of the model, indicating that the model was more practical. According to the characteristics of the experimental data set, the main parameter values set by the text are shown in Table 2, and K represents the number of topics contained.

**Table 2.** Parameter setting of the RLDA model. **Table 2.** Parameter setting of the RLDA model.


In this paper, the comparison experiment was carried out by changing the value of the number of topics K. Under different values of the number of topics, the corresponding coherence score value of the improved LDA model based on the Relevance formula was calculated according to the theme consistency calculation formula. The experimental comparison results are shown in Figures 6–9. As shown in Figure 6, as the number of topics continued to increase, the coherence score had a process of increasing first, then decreasing, and then slowly smoothing out. The score is the highest when the number of topics is about seven to eight. In this paper, the comparison experiment was carried out by changing the value of the number of topics K. Under different values of the number of topics, the corresponding coherence score value of the improved LDA model based on the Relevance formula was calculated according to the theme consistency calculation formula. The experimental comparison results are shown in Figures 6–9. As shown in Figure 6, as the number of topics continued to increase, the coherence score had a process of increasing first, then decreasing, and then slowly smoothing out. The score is the highest when the number of topics is about seven to eight. In this paper, the comparison experiment was carried out by changing the value of the number of topics K. Under different values of the number of topics, the corresponding coherence score value of the improved LDA model based on the Relevance formula was calculated according to the theme consistency calculation formula. The experimental comparison results are shown in Figures 6–9. As shown in Figure 6, as the number of topics continued to increase, the coherence score had a process of increasing first, then decreasing, and then slowly smoothing out. The score is the highest when the number of

*Energies* **2022**, *15*, 2400 11 of 16

topics is about seven to eight.

**Figure 6.** The relationship between the coherence score and number of topics. **Figure 6.** The relationship between the coherence score and number of topics. **Figure 6.** The relationship between the coherence score and number of topics.

**Figure 7.** Theme map of the theme model with eight topics (left) and seven topics (right). **Figure 7.** Theme map of the theme model with eight topics (left) and seven topics (right). **Figure 7.** Theme map of the theme model with eight topics (**left**) and seven topics (**right**).

*Energies* **2022**, *15*, 2400 12 of 16

**Figure 8.** Diagram of the relationship between *λ* and the consistency score. **Figure 8.** Diagram of the relationship between *λ* and the consistency score. **Figure 8.** Diagram of the relationship between *λ* and the consistency score.

**Figure 9.** Top 30 words of topic distribution and relativity of topic 1 when *λ* was 0.52. **Figure 9.** Top 30 words of topic distribution and relativity of topic 1 when *λ* was 0.52. **Figure 9.** Top 30 words of topic distribution and relativity of topic 1 when *λ* was 0.52.

With the help of the LDAvis toolkit, the model topics with topic number seven and topic number eight were reduced to a two-dimensional plane for visual display. The results are shown in Figure 7. The left half is the topic model with the number of topics eight, and the right half is the theme model with theme seven. The greater the degree of topic intersection, the greater the difficulty of distinguishing the topic. The degree of intersection between the topics of the model with eight topics was much greater than that With the help of the LDAvis toolkit, the model topics with topic number seven and topic number eight were reduced to a two-dimensional plane for visual display. The results are shown in Figure 7. The left half is the topic model with the number of topics eight, and the right half is the theme model with theme seven. The greater the degree of topic intersection, the greater the difficulty of distinguishing the topic. The degree of intersection between the topics of the model with eight topics was much greater than that of the model with seven topics. Therefore, this article was in the pursuit of model With the help of the LDAvis toolkit, the model topics with topic number seven and topic number eight were reduced to a two-dimensional plane for visual display. The results are shown in Figure 7. The left half is the topic model with the number of topics eight, and the right half is the theme model with theme seven. The greater the degree of topic intersection, the greater the difficulty of distinguishing the topic. The degree of intersection between the topics of the model with eight topics was much greater than that of the model with seven topics. Therefore, this article was in the pursuit of model generalization ability.

of the model with seven topics. Therefore, this article was in the pursuit of model generalization ability. When the weighting factor λ was close to 1, it indicated a high frequency of occurrence in the word frequency and a high contribution from its document topic. We can conclude that the relevance to the topic was higher in the default document. When the weight coefficient λ was close to 0, the improved model indicated that the word appeared more frequently in the selected topic, but less frequently in other topics; that is, generalization ability. When the weighting factor λ was close to 1, it indicated a high frequency of occurrence in the word frequency and a high contribution from its document topic. We can conclude that the relevance to the topic was higher in the default document. When the weight coefficient λ was close to 0, the improved model indicated that the word appeared more frequently in the selected topic, but less frequently in other topics; that is, the generality between words and topics appeared. Considering the influence of relevance When the weighting factor *λ* was close to 1, it indicated a high frequency of occurrence in the word frequency and a high contribution from its document topic. We can conclude that the relevance to the topic was higher in the default document. When the weight coefficient *λ* was close to 0, the improved model indicated that the word appeared more frequently in the selected topic, but less frequently in other topics; that is, the generality between words and topics appeared. Considering the influence of relevance and concomitates, the consistency score of the model was repeatedly calculated, and the result was

the generality between words and topics appeared. Considering the influence of relevance

found to be the best when *λ* was 0.52. The relationship between the weight coefficient and the consistency score is shown in Figure 8. When *λ* was 0.52, the relationship between the theme of topic 1 and the words is shown in Figure 9.

#### *4.2. Results and Analysis of Evaluation Index of Classification Effect*

Text classification effect evaluation is an important module of text classification. It usually uses the mixed matrix as the basis, also known as the error matrix. It is usually expressed in two-dimensional tables. The classification results can be visually analyzed through the confusion matrix [29,30]. The confusion matrix is shown in Table 3.

**Table 3.** Mixed matrix of classification results.


For classification results, internationally recognized evaluation indicators were used: accuracy rate P, recall rate R, and *F*1 values. The calculation formula is as follows:

$$\mathbf{P} = \frac{TP}{TP + FP} \tag{10}$$

$$\mathbf{R} = \frac{TP}{TP + FN} \tag{11}$$

$$F1 = \frac{\mathbf{2} \times \mathbf{P} \times \mathbf{R}}{\mathbf{P} + \mathbf{R}} \tag{12}$$

where *TP* indicates the number of samples that a certain type of text is correctly identified as a class, *FP* indicates the number of samples that a certain type of text has to be identified as other classes, and *FN* indicates that the text of other types is confirmed as the number of samples of the class. In order to verify the effectiveness of the improved input feature matrix, the CNN text classification method based on word2vec was compared with the experiments in this paper. It compared precision P, recall R, and *F*1 values.

In the next study, we will test the superiority of the presented method of this paper, which was compared with traditional machine learning methods such as SVM, LR, KNN, and other models to find the accuracy of each algorithm model on the same data set. The experimental results are shown in Table 4.

**Table 4.** Comparison of the experimental results with machine learning methods.


Compared with the traditional machine learning methods LR, SVM, and KNN, due to the large corpus short text in this experiment, the *F*1 values of the results were basically around 50%, and the accuracy of the highest SVM model classification results was only 54.53%. The accuracy of the typical CNN model classification results is only 55.36%. The

effect of machine learning classification was not ideal. The traditional LDA topic model extracts features and lacks contextual semantic information, which makes it difficult to achieve ideal results in short text classification. The *F*1 value of the experiment was only 63.00%. Compared with the traditional CNN, the model of WORD2VEC + TEXTCNN was 14.91% higher than WORD2VEC + CNN. The text was improved on the traditional LDA theme model. The weight coefficient *λ* was used to adjust the relationship between words and subjects. Finally, the *F*1 of the WORD2VEC + RLDA + TEXTCNN model was the highest, up to 81.69%, whether it was with traditional machines. Compared with the traditional convolutional neural network learning algorithm, the *F*1 results were significantly improved. Therefore, the generalization ability and practicability of the model constructed in this paper have satisfied the possibility of practical application.

#### **5. Discussion**

Aiming at the problem of multi-type and complex secondary equipment in power systems and the low accuracy of word segmentation results, in this paper, a stop words dictionary and a professional dictionary in the field of secondary equipment in power system were constructed. An improved LDA topic analysis model based on the Relevance formula was proposed. By setting different weight coefficients, the feature similar words in texts with different defect categories were separated to solve the problem of feature sparseness. An improved algorithm was proposed by integrating the improved LDA topic model with word2vec, where the global features were mined by using the topic model, and the context semantic features were mined by using the latent semantic word vector model, which can better extract the short text features. The multi-scale convolution kernel was used to extract features to enhance its ability to extract local sensitive information, and further to conduct in-depth mining of text semantic information.

There are also some problems, such as a large number of professional dictionaries in the field of secondary equipment are constructed in the preprocessing process, which improves the professionalism of this model to some extent. However, the direct application of this model to other fields is likely to lead to poor generalization ability. All these topics are left for the future and ongoing research topics.

#### **6. Conclusions**

In this paper, for the problem of short text information of secondary equipment faults in the power system and the high repetition of words between different defect categories, an LDA topic model based on the Relevance formula was built to dynamically adjust the correlation between topics and words. In addition, considering that the topic model itself has insufficient ability to extract short text features, the word2vec latent semantic feature vectors were fused to compensate for contextual semantic information. Considering that some fault text data were short, the traditional convolutional neural network had insufficient feature extraction, and multiple sizes of convolution kernels were used to extract features from short text data. Finally, using the fault text data generated by the actual operation of a power system company in a northwestern province to verify the method in this paper, the results showed that the algorithm has a certain practicality.

**Author Contributions:** J.L. created models, developed methodology, wrote the initial draft, and designed computer programs; H.M. supervised and leaded responsibility for the research activity planning and presented the critical review; X.X. and J.C. conducted the research and investigation process and edited the initial draft; J.L. and H.M. reviewed the manuscript and synthesized the study data. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was supported by the National Natural Science Foundation of China (51577050).

**Conflicts of Interest:** The authors declare no conflict of interest.
