3.3.2. TransH Model
The TransH model, introduced by Wang et al. [
10], builds upon the TransE [
11] framework to better handle complex relationships in KGs. Instead of embedding entities and relationships in a shared vector space, TransH utilizes relation-specific hyperplanes, defined by their normal vectors, enabling entities to possess unique embeddings for each relation. This concept is illustrated in
Figure 3. This provides a more flexible representation, enabling an entity to assume different roles across various relationships.
In the TransH model, each relation r is represented by a hyperplane defined by a normal vector
. Entities connected by a relation are projected onto this hyperplane to distinguish their roles in different relational contexts. Specifically, given a triple
, where
and
are the head and tail entities, the projections onto the hyperplane are calculated as shown in Equations (1) and (2).
Here, and represent the projections of and on the hyperplane associated with .
The scoring function in TransH is designed to measure the plausibility of triples by evaluating the distance between the projected head and tail entities in a relation-specific space. This scoring function is expressed as follows:
where
is the embedding of the relation
. The model is trained to minimize this score for correct triples and maximize it for incorrect ones, thus enhancing its discriminative power.
TransH improves upon its predecessor TransE by addressing its limitations in handling one-to-many, many-to-one, and many-to-many relationships. This ability makes it particularly advantageous for complex KGs such as those used in railway operational equipment fault diagnosis.
In the domain of Chinese railway operational equipment fault diagnosis, the TransH model’s strengths align closely with the inherent challenges of modeling interdependent and hierarchical relationships. Railway systems often exhibit intricate fault propagation patterns. For example, a failure in the signaling system can simultaneously influence track circuits, power supplies, and communication subsystems. TransH’s projection mechanism effectively separates these overlapping relationships by projecting entities onto distinct hyperplanes for each relation, preserving relational semantics and reducing interference.
In the context of fault diagnosis for Chinese railway operational equipment, the TransH model provides a robust framework for capturing the nuances of equipment relationships and their operational dynamics. This forms a critical component of the integrated TH-RotatE model, contributing to its overall effectiveness in anticipating equipment failures and optimizing maintenance strategies.
3.3.3. RotatE Model
The RotatE model, introduced by Sun et al. [
12], represents entities and relations in a complex vector space, enabling it to capture intricate relational patterns within KGs. The key innovation of RotatE lies in its ability to model various relation properties, such as symmetry, anti-symmetry, inversion, and composition, by representing relations as rotations in the complex plane. This capability allows RotatE to differentiate between complex relational patterns, outperforming traditional models in tasks like link prediction and KG completion.
In RotatE, each entity
is represented as a complex vector
, and each relation
is represented as a phase vector
, where
denotes the embedding dimension. Given a triple
, the model predicted the tail entity
by applying a rotation to the head entity
using the relation
, as Equation (4):
Here, denotes the Hadamard (element-wise) product in the complex space. The real and imaginary parts of the entities and relations are separately handled to facilitate this rotation.
The scoring function in RotatE measures the plausibility of a given triple by calculating the distance between the rotated head entity and the tail entity using either the
or
norm. It is defined as Equation (5):
where
,
and
are the embeddings of the head entity, relation, and tail entity, respectively. The model minimizes this distance for true triples while maximizing it for corrupted triples, thus ensuring an accurate representation of relationships in the KG.
Figure 4 illustrates RotatE using a 1-dimensional embedding, showcasing its ability to model symmetric relations. RotatE’s capability to model and infer complex relational patterns makes it highly suitable for fault diagnosis in railway operational equipment. By leveraging RotatE, our approach can effectively identify and predict potential faults by analyzing the intricate relationships between different equipment components, operational conditions, and historical fault data. This enhances the accuracy and reliability of fault diagnosis, contributing to improved maintenance strategies and operational efficiency.
In the context of fault diagnosis for railway operational equipment, the RotatE model enables a more detailed understanding of the dependencies among various system components, operational conditions, and historical fault records. For instance, relationships such as “fault causes another fault” (causality) or “equipment belongs to subsystem” (hierarchical structure) are inherently complex and benefit from RotatE’s ability to represent such relations geometrically.
By embedding entities such as “fault location”, “fault type”, and “maintenance action” into a complex vector space, RotatE provides a robust mechanism for analyzing interdependencies. When applied to our CROEFKG dataset, which consists of triples describing equipment faults, operational scenarios, and corrective measures, RotatE excels at identifying patterns that link specific fault phenomena to their underlying causes and potential remedies.
3.3.4. Integration of TransH and RotatE Models for Fault Diagnosis
The TH-RotatE model is a hybrid knowledge graph embedding framework that integrates the strengths of both TransH and RotatE to enhance fault diagnosis in Chinese railway operational equipment. This integration is motivated by the need to capture both the geometric structure of entity relations (via TransH) and the semantic–relational patterns including symmetry and hierarchy (via RotatE), as illustrated in
Figure 5. Such capability is crucial for modeling the multifaceted fault dependencies and propagation characteristics in railway systems.
- (1)
Unified Embedding and Fusion Strategy
In TH-RotatE, each input triple is processed in parallel through two distinct embedding pipelines:
TransH pipeline: projects the head and tail entities onto a relation-specific hyperplane using Equations (1) and (2), and computes the relational plausibility score using the TransH scoring function defined in Equation (3).
RotatE pipeline: Represents each entity as a complex-valued vector and each relation as a phase vector, applying an element-wise rotation to the head entity to obtain the predicted tail entity , as defined in Equation (4). The plausibility of the triple is then evaluated using the RotatE scoring function (Equation (5)).
To integrate these two distinct representations, TH-RotatE adopts a
score-level fusion strategy, combining the outputs of both scoring functions via a weighted additive formulation:
Here, and are non-negative trainable scalar parameters that govern the contribution of each component model during training. Rather than assigning these weights manually, they are optimized end-to-end via backpropagation, enabling the model to adaptively prioritize either structural (TransH) or semantic (RotatE) signals depending on the complexity and characteristics of different relational contexts.
To prevent degenerate behavior—such as over-reliance on a single component—we initialize and impose an normalization constraint throughout training. This constraint encourages balanced learning and improves interpretability by making the relative influence of each model explicitly quantifiable.
TH-RotatE deliberately adopts score-level fusion instead of embedding-level integration due to the heterogeneous nature of the component models. TransH operates in a real-valued space , while RotatE employs complex-valued vectors with phase-based relational transformations. Directly merging these embeddings—such as via concatenation or projection—would result in geometric inconsistencies and may distort the relational semantics that each model captures.
By integrating at the scoring function level, TH-RotatE preserves the semantic integrity of both embedding spaces and enables a modular yet coherent combination of their respective relational inductive biases: TransH excels at modeling hierarchical and type-constrained structures, while RotatE captures symmetric, anti-symmetric, and compositional relations. This fusion strategy enhances flexibility, robustness, and overall model expressiveness. For instance, in a scenario where a signal failure is caused by either a disconnection in the cable or a malfunction in the control module, TransH captures the hierarchical dependencies between components (e.g., signal → control circuit → power supply), while RotatE models the compositional or symmetric relationships between fault causes and their recurring patterns across different locations.
- (2)
Embedding Dimensions
To ensure architectural consistency and compatibility across the dual embedding pipelines in TH-RotatE, we explicitly set the embedding dimension to for both the TransH and RotatE components. This unified dimensionality facilitates a coherent fusion of the respective scoring functions, as it ensures that all embeddings possess equivalent representational capacity and are directly comparable in scale.
The choice of
represents a well-considered trade-off between expressive power and computational efficiency. This configuration was selected based on a grid search over candidate values
, where performance was evaluated on a held-out validation set. As detailed in
Section 4.3.3 (3) Effect of Embedding Dimension, the setting
consistently outperformed alternatives in terms of link prediction accuracy (MRR, Hit@K) while maintaining favorable training convergence speed and memory usage.
Moreover, maintaining identical dimensionality across both embedding streams simplifies model design and mitigates the risk of score imbalance during the weighted fusion process. In particular, it avoids complications arising from mismatched vector norms or heterogeneous feature distributions, thereby supporting stable gradient updates and preserving the semantic and structural complementarities encoded by the TransH and RotatE modules. This embedding strategy ultimately enhances the robustness and generalization capacity of the TH-RotatE framework.
- (3)
Self-adversarial Negative Sampling Strategy
To enhance the quality of negative samples during training, we incorporate a self-adversarial negative sampling strategy into the TH-RotatE framework. Unlike uniform sampling, which treats all negative samples equally, this method assigns higher weights to harder negatives—i.e., those that the model mistakenly deems plausible—thereby sharpening the model’s discrimination capability.
Given a positive triple
, a set of corrupted negative triples is generated by replacing either the head or tail entity. The model computes the plausibility score
for each negative triple
. The sampling probability
of the
-th negative sample is then calculated using a softmax function over the scores:
where
is a temperature parameter that controls the sharpness of the distribution, and
is the number of negative samples. These probabilities are used to weigh the negative log-likelihood loss, enabling the model to focus more on informative negative examples.
This strategy aligns the training process with the true optimization goal—minimizing ranking errors—while improving the training stability and convergence speed. It is particularly beneficial for modeling complex and ambiguous relationships in the fault knowledge graph.
- (4)
Loss Function
Building upon the self-adversarial negative sampling strategy described earlier, the training objective of TH-RotatE is defined to explicitly reward plausible triples while penalizing misleading yet informative negatives. To this end, we adopt a log-likelihood loss function that combines both positive and adversarially weighted negative components, as formulated in Equation (8):
Here, denotes the plausibility score for a triple computed by the fused scoring function, and is a fixed margin parameter controlling the separation between positive and negative examples. The negative sampling probabilities are computed using the self-adversarial sampling mechanism introduced in Equation (7), ensuring that more plausible negatives receive higher weights during optimization.
Intuitively, the first term of the loss function encourages the model to assign higher scores to valid triples, thus reinforcing their plausibility in the knowledge graph. The second term, on the other hand, penalizes negative triples in proportion to their predicted likelihood—placing more emphasis on hard negatives that the model is prone to misclassify. This contrastive learning approach helps the model sharpen its decision boundaries and reduces false positives.
Following the corruption strategy in [
10], negative triples
are generated by replacing either the head or the tail entity in a ground truth triple
, using a Bernoulli distribution-based sampling scheme. This introduces diverse and representative negatives for each relation type.
By training under this self-adversarial loss framework, the TH-RotatE model learns to robustly differentiate between true and false relational patterns. This capability is particularly critical in the CROEFKG dataset, where ambiguous fault dependencies and overlapping semantics frequently occur among entities such as fault causes, locations, categories, and handling measures.
- (5)
Training Algorithm of TH-RotatE
The end-to-end learning procedure of the TH-RotatE model is summarized in Algorithm 1. It jointly optimizes the real-valued and complex-valued embeddings using a weighted fusion mechanism and a self-adversarial loss function.
Algorithm 1: TH-RotatE Training Algorithm |
1 | , E |
2 | , |
3 | , |
4 |
|
5 | to E do |
6 | |
7 | B by Bernoulli corruption |
8 |
|
9 |
|
10 |
|
11 | for i = 1 to k do |
12 |
|
13 |
|
14 | } |
15 |
|
16 | return E, R |