Next Article in Journal
Editorial: Advances in Mathematical Modeling for Structural Engineering and Mechanics
Previous Article in Journal
On Signifiable Computability: Part II: An Axiomatization of Signifiable Computation and Debugger Theorems
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Graph-to-Text Generation with Bidirectional Dual Cross-Attention and Concatenation

by
Elias Lemuye Jimale
1,2,
Wenyu Chen
1,*,
Mugahed A. Al-antari
3,*,
Yeong Hyeon Gu
3,*,
Victor Kwaku Agbesi
1,
Wasif Feroze
1,
Feidu Akmel
4,
Juhar Mohammed Assefa
1 and
Ali Shahzad
1
1
School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China
2
College of Electrical Engineering and Computing, Adama Science and Technology University, Adama 1888, Ethiopia
3
Department of Artificial Intelligence and Data Science, College of AI Convergence, Sejong University, Seoul 05006, Republic of Korea
4
School of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China
*
Authors to whom correspondence should be addressed.
Mathematics 2025, 13(6), 935; https://doi.org/10.3390/math13060935
Submission received: 14 February 2025 / Revised: 6 March 2025 / Accepted: 7 March 2025 / Published: 11 March 2025
(This article belongs to the Section E1: Mathematics and Computer Science)

Abstract

:
Graph-to-text generation (G2T) involves converting structured graph data into natural language text, a task made challenging by the need for encoders to capture the entities and their relationships within the graph effectively. While transformer-based encoders have advanced natural language processing, their reliance on linearized data often obscures the complex interrelationships in graph structures, leading to structural loss. Conversely, graph attention networks excel at capturing graph structures but lack the pre-training advantages of transformers. To leverage the strengths of both modalities and bridge this gap, we propose a novel bidirectional dual cross-attention and concatenation (BDCC) mechanism that integrates outputs from a transformer-based encoder and a graph attention encoder. The bidirectional dual cross-attention computes attention scores bidirectionally, allowing graph features to attend to transformer features and vice versa, effectively capturing inter-modal relationships. The concatenation is applied to fuse the attended outputs, enabling robust feature fusion across modalities. We empirically validate BDCC on PathQuestions and WebNLG benchmark datasets, achieving BLEU scores of 67.41% and 66.58% and METEOR scores of 49.63% and 47.44%, respectively. The results outperform the baseline models and demonstrate that BDCC significantly improves G2T tasks by leveraging the synergistic benefits of graph attention and transformer encoders, addressing the limitations of existing approaches and showcasing the potential for future research in this area.

1. Introduction

Knowledge graphs provide structured representations that reveal hidden connections within data, enabling the generation of natural language text from complex datasets [1] and facilitating enhanced decision-making and innovation [2]. However, effectively converting the structured graph data into coherent text, known as graph-to-text generation (G2T), presents significant challenges. While transformer-based pre-trained language models (PLMs), such as BART [3], have advanced natural language processing [4], criticisms regarding their effectiveness for G2T tasks highlight notable limitations that deserve attention. Critics argue that the architectural constraints inherent in these models, particularly within their encoder components, necessitate the linearization of structured data. This requirement can hinder their capacity to capture the complex interrelationships fundamental to knowledge graphs, which results in the loss of critical structural information, adversely affecting performance in handling complex data structures [5,6,7]. In response to this challenge, researchers have proposed alternative solutions, including additional pre-training to enhance model performance on structured data [8,9,10]. While this approach in some scenarios shows promise [11,12], it raises concerns regarding the required computational resources [13,14,15]. Moreover, Ref. [8] introduced a structure-aware semantic aggregation module to capture the input graph’s structure at each transformer layer, explicitly learning graph-text alignments rather than simply fine-tuning text-to-text PLMs on G2T datasets. Alternatively, some researchers proposed dual encoders that integrate graph neural networks [16,17] to enhance structural awareness within PLMs. However, the performance benefits of the integrated graph encoder are still limited compared to the capabilities of PLMs alone.
Despite facing challenges, PLMs also exhibit a remarkable ability to transform structured data into relevant text textual descriptions, highlighting their versatility across various applications. Recent studies have highlighted their ability to generate meaningful text from structured inputs, indicating a promising direction for G2T tasks [11,18,19,20]. In this context, Ref. [21] demonstrates that PLM-based approaches, such as BART and T5, achieve state-of-the-art results in generating fluent text from graph-based data. Their findings reveal that PLMs can effectively generate text even when the graph input is simplified. Such results challenge the assumption that the explicit encoding of graph structure is necessary for optimal performance, highlighting the robustness of PLMs in handling linearized graph representations. The research provides valuable insights into how PLMs leverage pre-training and fine-tuning to excel in G2T, even without detailed structural information.
The ongoing discourse regarding the efficacy of PLMs for generating text from structured data reveals a spectrum of polarized perspectives that underscore notable concerns and notable advantages. Concurrently, innovative embedding techniques, such as graph attention networks [22,23], exhibit substantial potential in accurately representing the intricate structures of graphs but lack pre-training [24,25,26,27,28]. Given the divergent perspectives on the effectiveness of transformer encoders and the recognized benefits of graph attention mechanisms in enhancing G2T, this study investigates the potential synergy between transformer encoders, particularly pre-training advantage, as evidenced by [29,30] and graph attention mechanisms in G2T generation.
Motivated by the strength of attention mechanisms [31,32], we propose a novel bidirectional dual cross-attention and concatenation (BDCC), which integrates outputs from a transformer-based encoder and a graph attention encoder through a cross-attention mechanism to leverage the strengths of both modalities. This mechanism computes attention scores in both directions, where the graph features attend to transformer features and vice versa, effectively capturing inter-modal relationships. Moreover, we apply concatenation to fuse the attended outputs from the two cross-attention mechanisms, enabling effective feature fusion across modalities. We demonstrate that the proposed method improves G2T tasks across benchmark datasets, indicating that integrating graph attention and transformer encoders yields synergistic benefits.
  • We propose a dual cross-attention that leverages bidirectional interaction, enabling the graph attention to extract sequential context from the transformer encoder while allowing the transformer encoder to access structural information from the graph.
  • We concatenate the attended outputs to fuse from the two cross-attention mechanisms, enabling effective feature fusion across modalities.
  • We prove through our ablation study that graph attention performs better than the standard transformer encoder for G2T generation.
  • We empirically validate the proposed approach using the PathQuestions and WebNLG benchmark datasets, demonstrating its effectiveness in generating coherent and contextually relevant text from structured data.

2. Related Work

The domain of data-to-text generation (D2T), particularly in G2T tasks, has seen a surge of interest due to the advancements in PLMs. These models have revolutionized the transformation of structured data into coherent textual representations. We can broadly categorize the approaches that emerged in this domain as basic Seq2Seq learning and structure-aware Seq2Seq learning. The former relies on linearized structured data, while the latter emphasizes maintaining the inherent structure through different mechanisms. This section reviews the key contributions and methodologies in these areas, highlighting the approaches used to enhance the quality and accuracy of generated text.

2.1. Data-to-Text as Seq2Seq Tasks

Most previous studies have adapted the transformer model—designed for Seq2Seq tasks, such as machine translation—to cast structured data into text generation tasks. This adaptation is particularly relevant in Seq2Seq learning with PLMs, which requires linearized structured data to generate textual representations. Recent studies have demonstrated the potential of leveraging PLMs for D2T generation across various benchmarks, including task-oriented dialogue, table-to-text, and G2T, through a pre-training followed by a fine-tuning approach [18]. In their study, Ref. [21] evaluated the effectiveness of PLMs, such as BART and T5, for G2T tasks. The models receive a linearized representation of graphs, and the findings indicate substantial improvements attributable to pre-training. The study emphasizes that PLMs can perform well in G2T tasks, even when the underlying graph structures are simplified. In addition, a study that developed the FactSpotter metric used a similar learning approach to evaluate the factual accuracy of generated texts by verifying the representation of specific triples from input graphs [33]. The DecoStrat framework proposed by [19] enhanced D2T generation by integrating PLMs that take linearized input with various alternative decoding methods. It addresses existing limitations by providing a modular approach that optimizes the decoding process, leading to improved output diversity and accuracy in D2T tasks. While Seq2Seq learning has made significant strides in D2T, studies in this area have primarily focused on leveraging PLMs without adequately incorporating structural awareness into their methodologies. This oversight can limit the potential for capturing the inherent complexities of the data, as the reliance on linearized representations simplifies the rich structural information that could enhance the quality and coherence of the generated text. Therefore, while the adaptation of transformer models for D2T tasks has shown promise, there remains a critical need to explore methods that better integrate structural information to fully leverage the capabilities of PLMs in generating high-quality text from structured data.

2.2. Data-to-Text as Structure-Aware Seq2Seq Tasks

Structure-aware Seq2Seq approaches emphasize the necessity of integrating detailed graph structures into the text generation process. In contrast to the straightforward Seq2Seq methods, which often overlook these complexities, structure-aware techniques aim to address the limitations of linearized representations. This approach can be broadly categorized into structure awareness through further pre-training and incorporating graph structure into PLMs.

2.2.1. Further Pre-Training

Recent studies have proposed further pre-training strategies to enhance the PLMs [2,12], particularly in converting knowledge graphs (KG) into natural language text [8]. One prominent approach is knowledge-grounded pre-training (KGPT), which leverages pre-training and transfer learning to generate text enriched with external knowledge. This method includes a generation model that produces text based on a large corpus of knowledge-grounded data collected from the web, allowing the model to learn from vast amounts of unlabeled data. The effectiveness of KGPT is validated across fully-supervised, zero-shot, and few-shot settings, achieving notable performance gains [9]. Another significant strategy is the plan-and-pre-train approach, which addresses the challenge of converting structured knowledge inputs into a coherent text. This method employs a relational graph convolutional network planner to organize knowledge graph triplets into a linear sequence, which is then transformed into text using a pre-trained Seq2Seq model. Additionally, it incorporates a rule-based interface for formatting plans and canonicalization rules to handle special characters [34]. To further enhance the performance of pre-trained language models in G2T tasks, researchers have proposed methods that improve structure awareness by organizing input data before feeding it into the model. This includes multitask learning to determine optimal ordering and conducting second-phase G2T pre-training on similar datasets, which helps bridge the domain gap between text-to-text and G2T tasks. These methods have shown significant improvements in performance metrics, such as BLEU scores and slot error rates, when evaluated on the WebNLG dataset [10]. Moreover, the challenge of effectively encoding knowledge graphs while preserving their structural information has led to graph-text joint representation learning models. These models utilize a structure-aware semantic aggregation module and introduce pre-training tasks that reconstruct text from corrupted input and align graph and text representations using optimal transport [8]. In addition, innovative self-supervised graph masking pre-training strategies have been proposed to address the limitations of traditional linearization methods, which often overlook structural information. These strategies, including triple and relation predictions, enable the model to understand the structural relationships within graphs, leading to state-of-the-art results on benchmark datasets [35]. Lastly, advancements in transformer architectures for graph representation learning have been explored, including the development of Graphormer, which incorporates structural encoding methods to capture the unique characteristics of graph-structured data. This model enhances the standard transformer by integrating centrality, spatial encoding, and edge feature representations, demonstrating improved performance across various graph representation tasks [5].

2.2.2. Graph Representation

The integration of graph representation techniques into Seq2Seq PLMs has also become another focal point for enhancing G2T performance. Graph representation learning is vital for generating coherent and contextually accurate text in G2T tasks, as it effectively captures the relationships and dependencies within graph-structured data. Graph neural networks (GNNs) can aggregate information from neighboring nodes, which enhances the understanding of graph structures. GNNs employ message-passing through feature transformation and neighborhood aggregation. Feature transformation updates node features, while neighborhood aggregation allows nodes to share and combine features from their neighbors. Each layer assigns weights for adjacent nodes, enabling connections to an additional layer of neighbors [28,36,37]. Additionally, incorporating attention mechanisms into these models allows for a dynamic focus on relevant nodes and edges, improving the contextual relevance of the generated text [16]. Attention-based GNNs learn to aggregate information by evaluating the significance of relationships between node pairs. Each source node collects features from its neighbors based on this inferred importance [28]. Graph attention mechanisms compute relational importance, with graph attention networks (GATs) [22] and their variants [23] employing edge attention to assess the weight of each neighbor relative to a source node. Several studies have explored methods for effectively incorporating graph structures into these models. For instance, [17] introduced a dual encoding model that utilizes a graph convolutional network (GCN) for graph encoding and planning content organization, effectively linking graph structures with linear text descriptions through an LSTM-based encoder–decoder framework. Ref. [16] proposed a dual-path encoder G2T model that integrates a graph structure encoder and a text encoder to capture structural and textual information. The authors also introduced an alignment module to connect the encoded graph and text information effectively. In addition, a guidance module enhances the fluency and accuracy of the output by preventing errors in entity generation. Another significant contribution is the graph-guided self-attention proposed by [7], which enhances the integration of graph structures into PLMs. This mechanism effectively bridges the modality gap between text and graph data by incorporating token-level structural information without necessitating additional alignment or concatenation. Replacing standard self-attention layers in transformer architectures facilitates dynamic interactions between textual and structural inputs, significantly improving G2T generation capabilities while minimizing the number of trainable parameters. Furthermore, a distillation model presented by [38] enhances G2T through a cross-structure attention mechanism. This model employs a teacher–student architecture, where the teacher retains graph structure during encoding while leveraging textual representation. The cross-structure attention mechanism fosters effective interaction between linearized structured data and generated text, enhancing contextual understanding and overall text generation performance across G2T datasets.
Table 1 briefly summarizes the main points regarding the strengths and limitations of the basic seq-to-seq and the structure-aware seq-to-seq learning approaches. Generally, previous works show that integrating graph structure into transformer encoders or further pre-training on structured data has improved performance. While integrating graph representations into PLMs has shown promise for enhancing G2T performance, fully leveraging the potential collaboration between PLMs and graph attention remains challenging. Although approaches like graph neural networks and attention mechanisms strengthen the understanding of graph-structured data, there remains a gap in systematically incorporating these techniques to capture the rich relationships and dependencies inherent in the graph data. Addressing these limitations could lead to more coherent and contextually accurate text generation in G2T tasks.

3. Methodology

This section defines the G2T generation task and introduces the proposed method designed to achieve this task. We illustrated the entire process, shown in Figure 1, followed by a detailed explanation of each component constituting the framework.

3.1. Problem Formulation

The task of G2T involves generating coherent and contextually relevant text Y = ( y 1 , y 2 , , y m ) from a given knowledge graph G . This process requires a thorough understanding of the entities and their relationships in the graph. The knowledge graph G can be formally defined as G = ( V , E ) , where V = { e 1 , e 2 , , e | V | } , V denotes the set of entities, and E is a set of relations r i j that connect these entities. The primary challenge lies in effectively converting the structural information of the graph into a coherent and relevant textual description.

3.2. Proposed Framework

As illustrated in Figure 1, our proposed model is based on the transformer encoder–decoder framework and the graph attention mechanism. The LGE (linearized graph encoder) and the GE (graph encoder) receive sequential and graph data as inputs, respectively. Then the bidirectional dual cross-attention and concatenation (BDCC) module fuses the outputs from the LGE and GE to generate attended outputs, then the concatenation operation integrates the two attended outputs, enabling effective feature fusion across modalities. The decoder then leverages this contextual representation with the associated textual descriptions for predictions. We employ the BART PLM for G2T tasks. During the training phase, we initialize the parameters of the PLM and fine-tune them using the dataset for G2T generation. The decoder generates predictions auto-regressively, resulting in the text output Y. In the subsequent0, we describe the processes of each component in the framework in detail.

3.3. Linearized Graph Encoder

The linearized graph encoder (LGE) adapts the BART encoder as employed by [8] and employs linearization to convert graph data into a sequential format suitable for transformer-based models.

Graph Linearization

To prepare the input for the model, we follow established techniques for graph linearization as presented in works such as [8,9,16]. The initial step in transforming the graph G = ( V , E ) is to extract relevant entities and their relationships, which are structured into pairs in the form ( e i , r i j , e j ) . The linearization output can be mathematically defined as follows:
L G = Linearize ( G ) = ( e 1 , r i j , e j , ) ,
This formulation elucidates how L G is derived from the graph. To illustrate the linearization process, consider the sample graph structure G depicted in Figure 2. The conversion begins with identifying relevant entities and their interrelations from the knowledge base (KB). For each entity e i , we identify associated relationships and form pairs in the structure ( e i , r i j , e j ) . Each entity and relationship is then tokenized, with special tokens denoting their roles. Entities are prefixed with [head] or [tail] to indicate their position in the relationship. For example, the entity “Philbert I, Duke of Savoy” is represented as follows: [head] Philbert I, Duke of Savoy. Similarly, the relationship “parents” to “Amadeus IX, Duke of Savoy” is denoted as follows: [relation] parents [tail] Amadeus IX, Duke of Savoy. This tokenization process is applied uniformly across all entities and relationships, ensuring clarity in their representation. The final linearized output, which concatenates these tokenized elements, is shown at the bottom in Figure 2. This linearized sequence L G serves as the input for the subsequent encoding process in the BART model, facilitating the generation of contextually relevant text.
The linearized graph encoder (LGE) then processes an input sequence L G R n × d . The encoding process begins with the input sequence L G , which consists of n tokens, each represented in a d-dimensional space. The first step is to convert these tokens into embeddings and add positional information:
Z = LayerNorm ( E + P ) ,
where E = Embedding ( L G ) is the embedded input and P is the positional encoding matrix, which provides information about the position of each token in the sequence. Then the encoded representation Z is passed through L identical encoder blocks.
H ( l ) = EncoderBlock ( H ( l 1 ) ) , l = 1 , 2 , , L ,
where H ( 0 ) = Z , and each encoder block consists of multi-head self-attention and a feedforward network. The multi-head attention mechanism allows the model to focus on different parts of the input sequence simultaneously, which can be defined as follows:
MultiHead ( Q , K , V ) = Concat ( h 1 , , h h ) W o
where Q , K , V are the query, key, and value matrices, respectively, derived from the input Z. Each head i is computed as follows:
h i = Attention ( Q W i q , K W i k , V W i v )
where W i q , W i k , W i v are learned projection matrices for queries, keys, and values, respectively. And W o is the output projection matrix that combines the outputs of all heads. The final output of the encoder after processing through all blocks can be defined as follows:
h t = LGE ( L G )
where the output h t encapsulates the contextualized representation of the linearized graph or simply the input sequence, which is used as an input for the subsequent G2T operation.

3.4. Graph Encoder

A graph encoder processes the graph structure to generate embeddings for each entity. These embeddings are designed to capture the knowledge encoded through graph edges, ensuring that similar nodes are positioned close together in vector space. Specifically, connected nodes are expected to exhibit higher similarity scores, while unconnected nodes will show lower scores based on the adjacency tensor [39]. This characteristic of embeddings significantly enhances the representation of relationships within the graph data [40]. Our model employs a graph attention mechanism, following [23], which transforms the input feature vectors h g into output feature vectors h g + 1 through an adaptive weighting of neighboring node features. The output of each layer in the graph attention (GAT) can be expressed as follows:
h g + 1 = GAT h g , A ,
In this expression, h g + 1 = h 1 , h 2 , , h n denotes the output feature vector at layer + 1 for the graph attention mechanism, which processes the input feature vector h g = h 1 , h 2 , , h n from layer using the adjacency matrix A R m × m . The adjacency matrix representing the graph structure, where A i j = 1 if there is an edge from node i to node j, and 0 otherwise. For each graph attention layer , we compute the attention scores e i j between nodes i and j as follows:
e i j = a LeakyReLU W l h i + W r h j ,
where W l and W r are learnable weight matrices that transform the feature vectors of the left and right nodes, respectively. The resulting score is combined with a learnable parameter vector a to produce the final attention score. Next, we mask e i j based on the adjacency matrix. Specifically, e i j is set to if there is no edge from i to j:
e i j = e i j if A i j = 1 if A i j = 0 ,
The attention coefficients α i j are derived from the attention scores e i j using the softmax function:
α i j = softmax j ( e i j ) = exp ( e i j ) j N i exp ( e i j )
where N i is the set of nodes connected to node i.

3.5. Bidirectional Dual Cross-Attention and Concatenation

This section describes the details of the proposed bidirectional dual cross-attention and concatenation (BDCC) module, shown in Figure 3.
The BDCC module to effectively fuse the outputs h g R b × m × n of a graph encoder (GE) and standard transformer encoder (LGE) h t R b × m × n , where b represents the batch size, m is the token length or number of nodes, and n is the embedding dimension. This approach leverages the strengths of both architectures, allowing for a richer representation of the input data (we use the GE output h g + 1 as h g for simplicity in the subsequent expressions). Accordingly, we first project the outputs from the two sources into a common embedding space using linear transformations to facilitate the fusion of these two representations.
h g = h g W g + b g , h t = h t W t + b t ,
In these equations, W g and W t are the learned projection weights, while b g and b t are the corresponding bias terms. After obtaining the projected outputs h g and h t , we prepare them for multi-head attention by reshaping them from R b × m × n to R m × b × n . This transformation enables us to utilize the attention mechanisms effectively.

3.6. Bidirectional Dual Cross-Attentions

The first cross-attention involves the GE output querying the LGE output, indicating that the model learns how the features represented in the GE output relate to the features in the LGE output. We compute the queries Q , keys K , and values V as follows:
Q 1 = h g W Q 1 , K 1 = h t W K 1 , V 1 = h t W V 1 ,
The attention output, denoting the information from the LGE output that is most relevant to the GE output, is then calculated using the softmax function as follows:
A 1 = softmax Q 1 K 1 d V 1 ,
After obtaining the attention output A 1 , we reshape it back to R b × m × n and apply a residual connection followed by layer normalization to add the previous input and stabilize the training.
A 1 = LayerNorm ( A 1 + h g ) ,
The second cross-attention reverses the roles, with the LGE output querying the GE output. The corresponding queries, keys, and values are computed as follows:
Q 2 = h t W Q 2 , K 2 = h g W K 2 , V 2 = h g W V 2 ,
The attention output, the information from the GE output that is most relevant to the LGE output, is similarly computed as follows:
A 2 = softmax Q 2 K 2 d V 2 ,
Again, we reshape A 2 to R b × m × n and apply a residual connection and layer normalization:
A 2 = LayerNorm ( A 2 + h t ) ,

3.7. Concatenation

The second fusion stage where the outputs from both cross-attention mechanisms, A 1 and A 2 , are concatenated to form a combined richer representation that incorporates information from both modalities:
A concat = A 1 A 2 R b × m × 2 n ,

3.8. Feedforward Network

We apply a feedforward network (FFN) to refine the concatenated representation, followed by layer normalization.
A ffn = LayerNorm ( FFN ( A concat ) ) ,
Finally, a linear transformation is performed to produce the output f R b × m × n that encapsulates the fused representations from the two encoders, enabling enhanced performance in the G2T task.
The proposed BDCC framework can effectively address the potential structural loss that arises from linearizing graph inputs for transformer encoders by integrating the strengths of both graph attention networks and transformer architectures. It incorporates a bidirectional dual cross-attention mechanism that enables the GE to query the LGE and vice versa instead of merely linearizing graph data. This interaction allows for preserving the relational structures inherent in the graph data, reducing the risk of losing critical structural information during linearization. Specifically, the multi-head attention in the BDCC framework allows for nuanced learning of the relationships between graph features and token representations from the transformer. The attention scores computed during the cross-attention phases help the model to focus on relevant features from both modalities, dynamically capturing the interdependencies that a simple linearization might overlook. Subsequently, the framework involves concatenating the outputs from the attention mechanisms, creating a richer representation that encapsulates information from the graph and transformer encoders. This concatenation enhances the representation and maintains the complex interrelationships within the graph data, further mitigating structural loss. Overall, the BDCC framework effectively addresses potential structural loss by fostering a dynamic interplay between graph and transformer representations, enhancing G2T generation tasks, and demonstrating a significant advancement over traditional methods that rely solely on linearization.

3.9. Decoder

The decoder consists of stacked residual attention blocks similar to the encoder. Moreover, it adds an extra layer of cross-attention that enables it to focus on the contextual representations X produced by the encoders, where X is set to the output produced by the proposed model f . The decoder uses the hidden states to generate the output sequence Y . The decoding process plays a crucial role in sequence generation, as the output at each time step depends on the previous token and the hidden states. We can express this relationship as follows:
y t = Decoder ( y < t , X ) ,
In this equation, the input consists of the previously generated token y < t and the encoder’s hidden states X , which provide essential contextual information. The output y t represents the token generated at the current time step t, illustrating the sequential nature of the decoding process. This autoregressive characteristic emphasizes that each token is predicted based on both the preceding tokens and the contextual cues from the encoder. The probability of generating the entire output sequence is mathematically represented as follows:
P ( Y | X ) = t = 1 m P ( y t | y < t , X ) ,
Here, the generated output sequence is denoted as Y = ( y 1 , y 2 , , y m ) . This equation illustrates that the joint probability of the output sequence can be decomposed into a product of conditional probabilities for each token, given all previously generated tokens and the encoder’s hidden states. This probability calculation directly ties into the training objective, where the model aims to maximize the likelihood of the correct output sequence. During training, the BART-base model is initialized with pre-trained weights, tailored for the specific G2T task. The training loop spans multiple epochs and iterates through mini-batches of data. For each batch, the forward pass optimizes the loss L :
L = t log P ( y t | y < t , X ) ,
where y t represents the target token at position t, y < t includes the previous tokens, and X signifies the input features. The model’s performance is periodically evaluated on a validation dataset, with the best-performing model being saved for future inference if the validation score exceeds previous metrics. The model generates output sequences using a beam search strategy during inference. This involves exploring multiple possible output sequences and selecting the most likely one based on the model’s predictions. The goal is to produce coherent and fluent output sequences that accurately capture the semantic relationship in the input data. The generated output sequences are then evaluated using automatic metrics.

4. Experiment

4.1. Experimental Setup

Model training and evaluation were conducted on a Linux server with NVIDIA GeForce RTX 3090 GPUs, each featuring 24 GB of memory. We used the PyTorch framework and the Hugging Face Transformers library [41]. The model training process was built upon the work of [8]. We initialized the parameters of the pre-trained BART model checkpoints. The proposed model leverages pre-trained models with the integration of graph attention using the established training techniques to generate text from the input data. The detailed model specifications and training parameters used in our experiments are summarized in Table A1Table A3 in Appendix A.

4.2. Experimental Dataset

This section presents the datasets used for our experiments in the proposed G2T models. We selected two prominent datasets—PathQuestions and WebNLG—both designed to facilitate the generation of natural language text based on structured knowledge graphs.
PathQuestions: The PathQuestions (PQ) dataset is a benchmark for question generation over knowledge bases, designed to generate natural language questions about corresponding knowledge graphs [42]. The authors constructed the dataset from two subsets of Freebase as a knowledge base for the multi-relational question-and-answer task. The dataset features 2-hop or 3-hop paths between two entities, with each question containing two or three triples per question set and their corresponding textual description. The dataset was developed by extracting paths between two entities and generating natural language questions using templates, paraphrasing templates, and synonyms for relations [43].
WebNLG: WebNLG is a crowd-sourced benchmark dataset specifically designed for G2T tasks. Annotators crafted the dataset, which includes graphs from DBpedia [44] and features up to seven triples paired with one or more reference texts. This study utilizes the version 2.0 release of WebNLG, as referenced in previous studies [8,9,13], to ensure consistency and comparability in our evaluations.
Table 2 summarizes the key statistics of the datasets, including the total number of entities and relations, the average number of triples representing structured data in the form of subject–predicate–object relationships, the average length of texts, as well as the distribution of training, validation, and test sets. These statistics highlight the scale and complexity of the datasets, which are essential for training G2T models.

4.3. Automatic Evaluation

To evaluate the proposed G2T models for text generation, we employed standard metrics that assess lexical and semantic similarities between model outputs and annotated ground truth. The BLEU [45] score measures the alignment of the generated text with the reference text, providing insight into surface-level overlap. BLEU was among the first metrics to demonstrate a strong correlation with human quality assessments and continues to be one of the most widely used and cost-effective automated metrics. ROUGE [46] evaluates the overlap of n-grams, while the ROUGE-L variant considers word order through the longest common subsequence. The METEOR [47] score assesses the quality of our model’s output by comparing it to a reference text; it surpasses simple word matching by incorporating linguistic factors, yielding a more refined, human-like evaluation. These metrics comprehensively assess the ability of the proposed models to generate coherent and contextually appropriate texts.

4.4. Baseline Models

This section presents the baseline models used for comparison with the proposed model on benchmark datasets, utilizing automatic evaluation metrics. The baseline approaches encompass a variety of methods for text generation from knowledge graphs, as detailed in the related work section.
  • KGPT [9]: Leverages pre-training and transfer learning to generate text enriched with external knowledge.
  • CSAD [38]: Presents a distillation model for G2T generation that uses cross-structure attention to enhance interactions between structured data and text while training a student to mimic a teacher model.
  • Dual-path encoder [16]: Integrates attention-based graph encoder into the linearized graph encoder to enhance representation, complemented by an alignment and guidance module that utilizes a pointer network to improve generation accuracy.
  • G2S [48]: Generates texts from the graph, using a bidirectional graph to sequence model to encode the graph and a node-level copying mechanism to enhance the decoder.
  • JointGT [8]: Applied further pre-training of T5 and BART transformer models on structured data followed by fine-tuning on downstream G2T generation.
  • BART [3]: An application of the standard pre-trained BART-base model for linearized G2T generation as implemented by [8].
  • T5 [49]: An application of the standard pre-trained T5-base model for linearized G2T generation as implemented by [8].
  • GAP [13]: Integrates graph-aware elements into PLMs through a mask structure that captures neighborhood information and a type encoder that biases graph-attention weights based on connection types.
  • GCN [50]: Employs a graph convolutional neural encoder to process structured data for G2T generation.

5. Discussion of the Results

The results presented in Table 3 reveal that the proposed BDCC method significantly enhances the G2T generation tasks, particularly within the PathQuestions dataset. The performance metrics indicate that BDCC surpasses several baseline models, including state-of-the-art approaches. Specifically, BDCC achieves a BLEU score of 67.41% and a METEOR score of 49.63%, marking the highest results among the listed models for the PathQuestions dataset. This suggests the effectiveness of the proposed approach in capturing the complexities of the relationships within the graph data, leading to more coherent and contextually relevant text generation. Compared to graph neural network models like GCN and G2S, BDCC demonstrates the advancements that can be achieved through modern architectures such as transformers. For instance, both T5 and BART can be competitive but do not reach the performance levels of BDCC, underscoring the significant added value of the proposed fusion strategy. The specific method employed by BDCC to leverage the strengths of graph attention and transformer encoders simultaneously allows for a richer representation, enhancing its performance. Moreover, the results on the WebNLG dataset still reflect the robustness of BDCC. With a BLEU score of 66.58% and a METEOR score of 47.44%, BDCC maintains a competitive performance against other baseline models. This indicates that the proposed approach is versatile and can adapt to different datasets, further validating its effectiveness in G2T tasks.
The results of the CSAD [38] model indicate enhancements in G2T generation. However, it also has limitations that may lead to structural loss due to its reliance on linear representations for cross-attention with textual features without fully exploiting the underlying graph structure dynamics in its attention mechanism and added complexity from training a student model that depends on the teacher model. The findings from the dual-path encoder [16] also improved the G2T further but relied on the graph information that contributes only 10% to its optimal performance, indicating that the integration of the graph encoder output they incorporated with the transformer encoder may not be effective. Moreover, although the alignment module, which uses transformer-based encoder layers, slightly enhances performance, it also significantly increases the model size. In contrast, our BDCC approach applies bidirectional interactions that effectively integrate outputs from a transformer-based encoder and a graph attention encoder, enhancing overall performance in G2T generation and paving the way for future research to explore alternative integration strategies.

6. Ablation Study

In this section, we evaluate the impact of our proposed method on the performance of the G2T generation task. The PathQuestions dataset serves as the basis for this analysis, and we created model variants by isolating the integrated components as follows:
  • BDCC: The proposed bidirectional dual cross-attention and concatenation for G2T. This approach allows for a more nuanced interaction between the graph and transformer components, facilitating better contextual information exchange.
  • Concatenation: G2T with simple concatenation of graph and transformer-based encoders.
  • Graph encoder (GE): The G2T model relied solely on the graph attention (GATv2) mechanism, denoting individual contribution to the model performance.
  • Linearized graph encoder (LGE): G2T based on the linearization of graph data using a transformer-based encoder model.
  • Unidirectional graph attention (UGA): This variant uses only the graph output to attend to the transformer output without allowing feedback from the transformer to the graph.
  • Unidirectional transformer attention (UTA): This variant uses only the transformer output to attend to the graph output, limiting feedback from the graph.
The results of the ablation study, as summarized in Table 4, reveal insightful trends regarding the performance of each variant on the PathQuestions dataset. The proposed BDCC model achieved the highest performance with a BLEU score of 67.41% and a METEOR score of 49.63%. This approach underscores the effectiveness of the BDCC mechanism in maximizing the interaction between the graph and transformer outputs. The model using simple concatenation performed reasonably well, indicating that while this method is less sophisticated, it still retains a significant level of performance. Moreover, the GE model demonstrates that the graph attention mechanism alone is effective but does not match the performance of models that leverage more complex interactions, as it lacks the additional context provided by the transformer encoder. Similarly, the LGE model produced a BLEU score of 65.74%, suggesting that while linearization can be beneficial, it does not fully capture the advantages of the bidirectional approach. UGA and UTA are also competitive, but their performance is lower than the proposed BDCC, indicating that feedback between the graph and transformer components is crucial for optimal performance.
In addition to the commonly used surface-level evaluation, we employed BERT-based metrics to assess the semantic quality of the generated text. Accordingly, compared with the other variants, the BDCC model achieved slightly higher precision (0.901), recall (0.8962), and F1 score (0.8972), demonstrating its effectiveness in generating relevant outputs. The other variants also show competitive performance, with slightly lower than BDCC. The BERT embeddings in the G2T model evaluation model can allow for semantic understanding, such as handling synonyms and linguistic variations. However, this approach depends on the quality of the pre-trained BERT model, potentially affecting its applicability in domain-specific contexts, and is also computationally intensive.
The proposed model, illustrated in Figure 4, takes slightly longer to converge than the LGE model. However, its elapsed time is comparable to the simple concatenation method. The ablation result analysis demonstrates the effectiveness of the BDCC approach in generating more accurate text outputs. Interestingly, the simple concatenation and the individual graph encoder models outperform standard transformer models such as T5 and BART. This suggests that our model effectively integrates graph-based information, enhancing overall performance compared to standard transformer models such as T5 and BART. Although the concatenation model is less effective than BDCC, it still outperforms the baseline models, underscoring the importance of integrating the graph encoder. Our transformer-based BDCC model effectively leverages parallelization to converge. We trained the model for 2.26 h on two NVIDIA GeForce RTX 3090 GPUs, each with 24 GB of memory, and it may not even require the full memory capacity of both GPUs. We trained all other model variants on two GPUs as well. When we compare our model to various model variants and standard transformer models, it consistently demonstrates superior performance, requiring only a minimal increase in training time and model parameters. Notably, our model has fewer parameters than the T5 model.

7. Conclusions

This study proposes a novel bidirectional dual cross-attention and concatenation (BDCC) that integrates outputs from a transformer-based encoder and a graph attention encoder through cross-attention mechanisms and concatenation, effectively capturing inter-modal relationships and fusion across modalities. The results of BDCC highlight its effectiveness in G2T generation tasks, outperforming several baseline models. Specifically, it achieved BLEU and METEOR scores of 67.41% and 49.63%, respectively, on the PathQuestions dataset, and scores of 66.58% (BLEU) and 47.44% (METEOR), respectively, on the WebNLG dataset. The proposed strategy facilitates a rich representation of the input data by fostering a dynamic interaction between graph and sequential data, enabling a more nuanced understanding of the relationships inherent in the input. This innovative approach ultimately leads to more coherent and contextually relevant text generation, underscoring the synergetic benefits of our BDCC architecture in addressing the complexities of G2T tasks. Future work may focus on further optimizing the interactions between modalities and exploring additional architectural variations to push the limits of the G2T generation task. While the integration of graph attention within the transformer encoder framework using our method has yielded promising results, we also acknowledge that this is not the only pathway to address existing limitations; alternative solutions may include developing linearization algorithms that effectively translate the structural information of knowledge graphs into a linear format suitable for text generation or designing transformer encoders capable of directly processing graph inputs without the need for linearization. Overall, this study demonstrates that integrating transformer-based encoders and graph attention mechanisms through our proposed BDCC method significantly enhances the quality of the generated text, paving the way for further advancements in G2T generation methodologies.

Author Contributions

Conceptualization, E.L.J., M.A.A.-a. and Y.H.G.; Methodology, E.L.J., M.A.A.-a., W.F. and F.A.; Formal analysis, E.L.J.; Investigation, E.L.J. and Y.H.G.; Resources, W.C.; Data curation, V.K.A., W.F., J.M.A. and A.S.; Writing — original draft, E.L.J.; Writing — review and editing, E.L.J., W.C., M.A.A.-a., Y.H.G., V.K.A., W.F., F.A., J.M.A. and A.S.; Visualization, E.L.J., V.K.A. and F.A.; Supervision, W.C.; Funding acquisition, M.A.A.-a. and Y.H.G. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by Technology Innovation Program (2410002644, Development of robotic manipulation task learning based on Foundation model to understand and reason about task situations) funded by the Ministry of Trade Industry & Energy (MoTIE, Korea).

Data Availability Statement

The datasets used in this study are publicly available and freely accessible at https://gitlab.com/shimorina/webnlg-dataset and https://github.com/hugochan/Graph2Seq-for-KGQG, both accessed on 13 February 2025.

Acknowledgments

This work was supported by Technology Innovation Program (2410002644, Development of robotic manipulation task learning based on Foundation model to understand and reason about task situations) funded By the Ministry of Trade Industry & Energy (MoTIE, Korea). This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIT) (No. RS-2022- 00166402 and RS-2023-00256517).

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this study.

Abbreviations

The following abbreviations are used in this manuscript:
BARTbidirectional and auto-regressive transformer
BDCCbidirectional dual cross-attention and concatenation
D2Tdata-to-text generation
GATgraph attention network
GATv2Graph Attention Network Version 2
G2Tgraph-to-text generation
GNNgraph neural network
PLMspre-trained language models
Seq2Seqsequence-to-sequence
T5text-to-text transfer transformer
UGAunidirectional graph attention
UTAunidirectional transformer attention

Appendix A. Model Specifications and Training Parameters

Table A1. Model specifications and training parameters used for the experiments.
Table A1. Model specifications and training parameters used for the experiments.
BART-Base ModelValue
Layers6 + 6
Heads12
Parameters140 M
Table A2. Specifications of the integrated graph attention model.
Table A2. Specifications of the integrated graph attention model.
GATv2 ModelValue
Heads4
Layers (1 hidden 1 output)2
Dropout rate0.2
Leaky ReLU negative slope0.2
Dropout rate (in the layer)0.3
Table A3. Training parameters used for the experiments.
Table A3. Training parameters used for the experiments.
ParamValue
OptimizerAdamW
Beam size5
Warm-up steps1100/1600
Batch Size32/32
Learning Rate 5 × 10 5 / 2 × 10 5
Epochs60 /120
The parameter values separated by “/” in Table A3 are for the PathQuestions and WebNLG datasets, respectively.

References

  1. Gupta, R.; Srinivasa, S. Workshop on Enterprise Knowledge Graphs using Large Language Models. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management (CIKM ’23), Birmingham, UK, 21–25 October 2023; Association for Computing Machinery: New York, NY, USA, 2023; pp. 5271–5272. [Google Scholar] [CrossRef]
  2. Li, S.; Li, L.; Geng, R.; Yang, M.; Li, B.; Yuan, G.; He, W.; Yuan, S.; Ma, C.; Huang, F.; et al. Unifying Structured Data as Graph for Data-to-Text Pre-Training. Trans. Assoc. Comput. Linguist. 2024, 12, 210–228. [Google Scholar] [CrossRef]
  3. Lewis, M.; Liu, Y.; Goyal, N.; Ghazvininejad, M.; Mohamed, A.; Levy, O.; Stoyanov, V.; Zettlemoyer, L. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Dublin, Ireland, 6–8 July 2020; Jurafsky, D., Chai, J., Schluter, N., Tetreault, J., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2020; pp. 7871–7880. [Google Scholar] [CrossRef]
  4. Hamilton, W.L.; Ying, R.; Leskovec, J. Representation Learning on Graphs: Methods and Applications. IEEE Data Eng. Bull. 2017, 40, 52–74. [Google Scholar]
  5. Ying, C.; Cai, T.; Luo, S.; Zheng, S.; Ke, G.; He, D.; Shen, Y.; Liu, T.Y. Do transformers really perform bad for graph representation? In Proceedings of the 35th International Conference on Neural Information Processing Systems (NIPS ’21), Vancouver, BC, Canada, 7–10 December 2021; Curran Associates Inc.: Red Hook, NY, USA, 2021. [Google Scholar]
  6. Wang, T.; Shen, B.; Zhang, J.; Zhong, Y. Improving PLMs for Graph-to-Text Generation by Relational Orientation Attention. Neural Process. Lett. 2023, 55, 7967–7983. [Google Scholar] [CrossRef]
  7. Yuan, S.; Färber, M. GraSAME: Injecting Token-Level Structural Information to Pretrained Language Models via Graph-guided Self-Attention Mechanism. In Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2024, Mexico City, Mexico, 16–21 June 2024; Duh, K., Gomez, H., Bethard, S., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2024; pp. 920–933. [Google Scholar] [CrossRef]
  8. Ke, P.; Ji, H.; Ran, Y.; Cui, X.; Wang, L.; Song, L.; Zhu, X.; Huang, M. JointGT: Graph-Text Joint Representation Learning for Text Generation from Knowledge Graphs. In Proceedings of the Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, Bangkok, Thailand, 1–6 August 2021; Zong, C., Xia, F., Li, W., Navigli, R., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2021; pp. 2526–2538. [Google Scholar] [CrossRef]
  9. Chen, W.; Su, Y.; Yan, X.; Wang, W.Y. KGPT: Knowledge-Grounded Pre-Training for Data-to-Text Generation. In Proceedings of the Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, 16–20 November 2020; Webber, B., Cohn, T., He, Y., Liu, Y., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2020; pp. 8635–8648. [Google Scholar] [CrossRef]
  10. Yang, Z.; Einolghozati, A.; Inan, H.; Diedrick, K.; Fan, A.; Donmez, P.; Gupta, S. Improving Text-to-Text Pre-trained Models for the Graph-to-Text Task. In Proceedings of the 3rd International Workshop on Natural Language Generation from the Semantic Web (WebNLG+), Dublin, Ireland, 18 December 2020; Castro Ferreira, T., Gardent, C., Ilinykh, N., van der Lee, C., Mille, S., Moussallem, D., Shimorina, A., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2020; pp. 107–116. [Google Scholar]
  11. Yin, X.; Wan, X. How Do Seq2Seq Models Perform on End-to-End Data-to-Text Generation? In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Dublin, Ireland, 22–27 May 2022; Muresan, S., Nakov, P., Villavicencio, A., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2022; pp. 7701–7710. [Google Scholar] [CrossRef]
  12. Xie, Y.; Aggarwal, K.; Ahmad, A. Efficient Continual Pre-training for Building Domain Specific Large Language Models. In Proceedings of the Findings of the Association for Computational Linguistics: ACL 2024, Bangkok, Thailand, 11–16 August 2024; Ku, L.W., Martins, A., Srikumar, V., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2024; pp. 10184–10201. [Google Scholar] [CrossRef]
  13. Colas, A.; Alvandipour, M.; Wang, D.Z. GAP: A Graph-aware Language Model Framework for Knowledge Graph-to-Text Generation. In Proceedings of the 29th International Conference on Computational Linguistics, Gyeongju, Republic of Korea, 12–17 October 2022; Calzolari, N., Huang, C.R., Kim, H., Pustejovsky, J., Wanner, L., Choi, K.S., Ryu, P.M., Chen, H.H., Donatelli, L., Ji, H., et al., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2022; pp. 5755–5769. [Google Scholar]
  14. Liu, W.; Zhou, P.; Zhao, Z.; Wang, Z.; Ju, Q.; Deng, H.; Wang, P. K-BERT: Enabling Language Representation with Knowledge Graph. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 2901–2908. [Google Scholar] [CrossRef]
  15. Shi, X.; Xia, Z.; Li, Y.; Wang, X.; Niu, Y. Fusing graph structural information with pre-trained generative model for knowledge graph-to-text generation. Knowl. Inf. Syst. 2024, 67, 2619–2640. [Google Scholar] [CrossRef]
  16. Zhao, T.; Liu, Y.; Su, X.; Li, J.; Gao, G. Exploring the Synergy of Dual-path Encoder and Alignment Module for Better Graph-to-Text Generation. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), Torino, Italy, 20–25 May 2024; Calzolari, N., Kan, M.Y., Hoste, V., Lenci, A., Sakti, S., Xue, N., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2024; pp. 6980–6991. [Google Scholar]
  17. Zhao, C.; Walker, M.; Chaturvedi, S. Bridging the Structural Gap Between Encoding and Decoding for Data-To-Text Generation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; Jurafsky, D., Chai, J., Schluter, N., Tetreault, J., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2020; pp. 2481–2491. [Google Scholar] [CrossRef]
  18. Kale, M.; Rastogi, A. Text-to-Text Pre-Training for Data-to-Text Tasks. In Proceedings of the 13th International Conference on Natural Language Generation, Dublin, Ireland, 15–18 December 2020; Davis, B., Graham, Y., Kelleher, J., Sripada, Y., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2020; pp. 97–102. [Google Scholar] [CrossRef]
  19. Jimale, E.L.; Wenyu, C.; Al-antari, M.A.; Gu, Y.H.; Agbesi, V.K.; Feroze, W. DecoStrat: Leveraging the Capabilities of Language Models in D2T Generation via Decoding Framework. Mathematics 2024, 12, 3596. [Google Scholar] [CrossRef]
  20. Kim, J.; Nguyen, T.D.; Min, S.; Cho, S.; Lee, M.; Lee, H.; Hong, S. Pure transformers are powerful graph learners. In Proceedings of the 36th International Conference on Neural Information Processing Systems (NIPS ’22), New Orleans, LA, USA, 28 November–9 December 2022; Curran Associates Inc.: Red Hook, NY, USA, 2022. [Google Scholar]
  21. Ribeiro, L.F.R.; Schmitt, M.; Schütze, H.; Gurevych, I. Investigating Pretrained Language Models for Graph-to-Text Generation. In Proceedings of the 3rd Workshop on Natural Language Processing for Conversational AI, Online, 10 November 2021; Papangelis, A., Budzianowski, P., Liu, B., Nouri, E., Rastogi, A., Chen, Y.N., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2021; pp. 211–227. [Google Scholar] [CrossRef]
  22. Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Liò, P.; Bengio, Y. Graph Attention Networks. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
  23. Brody, S.; Alon, U.; Yahav, E. How Attentive are Graph Attention Networks? In Proceedings of the International Conference on Learning Representations, Online, 25–29 April 2022. [Google Scholar]
  24. Chen, F.; Wang, Y.C.; Wang, B.; Kuo, C.C.J. Graph representation learning: A survey. APSIPA Trans. Signal Inf. Process. 2020, 9, e15. [Google Scholar] [CrossRef]
  25. Kanatsoulis, C.I.; Ribeiro, A. Graph Neural Networks are More Powerful than We Think. In Proceedings of the ICASSP 2024—2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, Republic of Korea, 14–19 April 2024; pp. 7550–7554. [Google Scholar] [CrossRef]
  26. Wang, Q.; Mao, Z.; Wang, B.; Guo, L. Knowledge Graph Embedding: A Survey of Approaches and Applications. IEEE Trans. Knowl. Data Eng. 2017, 29, 2724–2743. [Google Scholar] [CrossRef]
  27. Chen, Z.; Mao, H.; Li, H.; Jin, W.; Wen, H.; Wei, X.; Wang, S.; Yin, D.; Fan, W.; Liu, H.; et al. Exploring the Potential of Large Language Models (LLMs)in Learning on Graphs. ACM SIGKDD Explor. Newsl. 2024, 25, 42–61. [Google Scholar] [CrossRef]
  28. Lee, S.Y.; Bu, F.; Yoo, J.; Shin, K. Towards deep attention in graph neural networks: Problems and remedies. In Proceedings of the 40th International Conference on Machine Learning (ICML’23), Honolulu, HI, USA, 23–29 July 2023. [Google Scholar]
  29. Erhan, D.; Bengio, Y.; Courville, A.; Manzagol, P.A.; Vincent, P.; Bengio, S. Why Does Unsupervised Pre-training Help Deep Learning? J. Mach. Learn. Res. 2010, 11, 625–660. [Google Scholar]
  30. Ye, H.; Chen, X.; Wang, L.; Du, S.S. On the Power of Pre-training for Generalization in RL: Provable Benefits and Hardness. In Proceedings of the 40th International Conference on Machine Learning, Honolulu, HI, USA, 23–29 July 2023; Krause, A., Brunskill, E., Cho, K., Engelhardt, B., Sabato, S., Scarlett, J., Eds.; JMLR.org: Norfolk, MA, USA, 2023; Volume 202, pp. 39770–39800. [Google Scholar]
  31. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.u.; Polosukhin, I. Attention is All you Need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2017; Volume 30. [Google Scholar]
  32. Gheini, M.; Ren, X.; May, J. Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Online, 7–11 November 2021; Moens, M.F., Huang, X., Specia, L., Yih, S.W.T., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2021; pp. 1754–1765. [Google Scholar] [CrossRef]
  33. Zhang, K.; Balalau, O.; Manolescu, I. FactSpotter: Evaluating the Factual Faithfulness of Graph-to-Text Generation. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, Singapore, 6–10 December 2023; Bouamor, H., Pino, J., Bali, K., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2023; pp. 10025–10042. [Google Scholar] [CrossRef]
  34. Guo, Q.; Jin, Z.; Dai, N.; Qiu, X.; Xue, X.; Wipf, D.; Zhang, Z. P2: A Plan-and-Pretrain Approach for Knowledge Graph-to-Text Generation. In Proceedings of the 3rd International Workshop on Natural Language Generation from the Semantic Web (WebNLG+), Dublin, Ireland, 18 December 2020; Castro Ferreira, T., Gardent, C., Ilinykh, N., van der Lee, C., Mille, S., Moussallem, D., Shimorina, A., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2020; pp. 100–106. [Google Scholar]
  35. Han, J.; Shareghi, E. Self-supervised Graph Masking Pre-training for Graph-to-Text Generation. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Abu Dhabi, United Arab Emirates, 7–11 December 2022; Goldberg, Y., Kozareva, Z., Zhang, Y., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2022; pp. 4845–4853. [Google Scholar] [CrossRef]
  36. Gilmer, J.; Schoenholz, S.S.; Riley, P.F.; Vinyals, O.; Dahl, G.E. Neural message passing for Quantum chemistry. In Proceedings of the 34th International Conference on Machine Learning (ICML’17), Sydney, Australia, 6–11 August 2017; Volume 70, pp. 1263–1272. [Google Scholar]
  37. Ye, Z.; Kumar, Y.J.; Sing, G.O.; Song, F.; Wang, J. A Comprehensive Survey of Graph Neural Networks for Knowledge Graphs. IEEE Access 2022, 10, 75729–75741. [Google Scholar] [CrossRef]
  38. Shi, X.; Xia, Z.; Cheng, P.; Li, Y. Enhancing text generation from knowledge graphs with cross-structure attention distillation. Eng. Appl. Artif. Intell. 2024, 136, 108971. [Google Scholar] [CrossRef]
  39. Nickel, M.; Murphy, K.; Tresp, V.; Gabrilovich, E. A Review of Relational Machine Learning for Knowledge Graphs. Proc. IEEE 2016, 104, 11–33. [Google Scholar] [CrossRef]
  40. Bordes, A.; Weston, J.; Collobert, R.; Bengio, Y. Learning structured embeddings of knowledge bases. In Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence (AAAI’11), San Francisco, CA, USA, 7–11 August 2011; pp. 301–306. [Google Scholar]
  41. Wolf, T.; Debut, L.; Sanh, V.; Chaumond, J.; Delangue, C.; Moi, A.; Cistac, P.; Rault, T.; Louf, R.; Funtowicz, M.; et al. Transformers: State-of-the-Art Natural Language Processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Online, 16–20 November 2020; Liu, Q., Schlangen, D., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2020; pp. 38–45. [Google Scholar] [CrossRef]
  42. Serban, I.V.; García-Durán, A.; Gulcehre, C.; Ahn, S.; Chandar, S.; Courville, A.; Bengio, Y. Generating Factoid Questions With Recurrent Neural Networks: The 30M Factoid Question-Answer Corpus. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Berlin, Germany, 7–12 August 2016; Erk, K., Smith, N.A., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2016; pp. 588–598. [Google Scholar] [CrossRef]
  43. Zhou, M.; Huang, M.; Zhu, X. An Interpretable Reasoning Network for Multi-Relation Question Answering. In Proceedings of the 27th International Conference on Computational Linguistics, Hsinchu, Taiwan, 1–2 October 2015; Bender, E.M., Derczynski, L., Isabelle, P., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2018; pp. 2010–2022. [Google Scholar]
  44. Auer, S.; Bizer, C.; Kobilarov, G.; Lehmann, J.; Cyganiak, R.; Ives, Z. DBpedia: A Nucleus for a Web of Open Data. In Proceedings of the The Semantic Web: 6th International Semantic Web Conference, 2nd Asian Semantic Web Conference, ISWC 2007 + ASWC 2007, Busan, Republic of Korea, 11–15 November 2007; Springer: Berlin/Heidelberg, Germany, 2007; pp. 722–735. [Google Scholar] [CrossRef]
  45. Papineni, K.; Roukos, S.; Ward, T.; Zhu, W.J. Bleu: A Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, PA, USA, 6–12 July 2016; Isabelle, P., Charniak, E., Lin, D., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2002; pp. 311–318. [Google Scholar] [CrossRef]
  46. Lin, C.Y. ROUGE: A Package for Automatic Evaluation of Summaries. In Proceedings of the Text Summarization Branches Out, Barcelona, Spain, 25–26 July 2004; pp. 74–81. [Google Scholar]
  47. Banerjee, S.; Lavie, A. METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, Ann Arbor, MI, USA, 5–8 November 2018; Goldstein, J., Lavie, A., Lin, C.Y., Voss, C., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2005; pp. 65–72. [Google Scholar]
  48. Chen, Y.; Wu, L.; Zaki, M.J. Toward Subgraph-Guided Knowledge Graph Question Generation With Graph Neural Networks. IEEE Trans. Neural Netw. Learn. Syst. 2024, 35, 12706–12717. [Google Scholar] [CrossRef] [PubMed]
  49. Raffel, C.; Shazeer, N.; Roberts, A.; Lee, K.; Narang, S.; Matena, M.; Zhou, Y.; Li, W.; Liu, P.J. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. J. Mach. Learn. Res. 2020, 21, 1–67. [Google Scholar]
  50. Marcheggiani, D.; Perez-Beltrachini, L. Deep Graph Convolutional Encoders for Structured Data to Text Generation. In Proceedings of the 11th International Conference on Natural Language Generation, Tilburg, The Netherlands, 5–8 November 2018; Krahmer, E., Gatt, A., Goudbeek, M., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2018; pp. 1–9. [Google Scholar] [CrossRef]
Figure 1. Proposed framework built upon transformer and graph attention architectures.
Figure 1. Proposed framework built upon transformer and graph attention architectures.
Mathematics 13 00935 g001
Figure 2. Sample input graph data [Left], graph data representation in the adjacent matrix [Right], linearized graph [Bottom], and the corresponding target text derived from the data [Middle].
Figure 2. Sample input graph data [Left], graph data representation in the adjacent matrix [Right], linearized graph [Bottom], and the corresponding target text derived from the data [Middle].
Mathematics 13 00935 g002
Figure 3. Bidirectional Dual Cross-attention and Concatenation (BDCC).
Figure 3. Bidirectional Dual Cross-attention and Concatenation (BDCC).
Mathematics 13 00935 g003
Figure 4. Illustration indicating the training dynamics and the total elapsed time of the proposed BDCC against simple concatenation and LGE.
Figure 4. Illustration indicating the training dynamics and the total elapsed time of the proposed BDCC against simple concatenation and LGE.
Mathematics 13 00935 g004
Table 1. Strengths and limitations of basic Seq2Seq and structure-aware Seq2Seq approaches.
Table 1. Strengths and limitations of basic Seq2Seq and structure-aware Seq2Seq approaches.
Basic Seq2SeqStructure-Aware Seq2Seq
Strengths
  • Straightforward adaptation of transformer models for D2T tasks such as dialogue generation and table-to-text tasks.
  • Directly leverages PLMs for improved performance in D2T generation.
  • Incorporates graph structures either by utilizing GNNs or further pre-training of the models on structured data for improved performance in D2T generation.
  • Utilizes attention mechanisms for better representation and improves performance in G2T tasks by preserving structural information.
Limitations
  • Relies on linearized structured data, which may simplify complex structures and struggle to capture inherent complexities of graph-structured data.
  • May require additional pre-training and complex integration strategies, potentially increasing computational costs.
Table 2. Training data statistics.
Table 2. Training data statistics.
DatasetEntitiesRelationsTrainValidTestTriplesLength
PQ72503789793100010002.714.0
WebNLG311437334,352431642242.922.7
Table 3. Evaluation results of the proposed approach compared with baselines on the G2T generation dataset.
Table 3. Evaluation results of the proposed approach compared with baselines on the G2T generation dataset.
DatasetPathQuestionsWebNLG
Model BLEU METEOR ROUGE BLEU METEOR ROUGE
GCN [50]---60.8042.7671.13
G2S [48]61.4844.5777.72---
T5 [49]58.9544.7276.5864.4246.5874.77
BART [3]63.7447.2377.7664.5546.5175.13
KGPT [9]---64.1146.3074.57
JointGT (BART) [8]65.8948.2578.8765.9247.1576.10
JointGT (T5) [8]60.4545.3877.5966.1447.2575.91
Dual-path encoder [16]67.2048.5679.6266.4147.3876.18
CSAD [38]66.6149.1277.04---
GAP [13]---66.2046.7776.36
BDCC (Ours)67.4149.6376.2966.5847.4476.32
Table 4. Ablation study using the PathQuestions G2T dataset.
Table 4. Ablation study using the PathQuestions G2T dataset.
DatasetPathQuestions
Model #Param B M R Precision Recall f1_score
T5 [49]220 M58.9544.7276.58---
BART [3]140 M63.7447.2377.76---
BDCC166 M67.41 *49.63 *76.290.9010.89620.8972
Concatenation164 M65.8648.6776.580.89650.89140.8938
GE-65.3748.4976.100.89450.89070.8924
LGE160 M65.7448.6375.270.89810.89370.8957
UGA165 M65.9148.8275.620.89780.89400.8956
UTA165 M65.9748.9375.960.89870.89440.8936
* The proposed BDCC model significantly outperforms the standard T5 and BART transformer models on PathQuestions (t-test, p < 0.05).
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Jimale, E.L.; Chen, W.; Al-antari, M.A.; Gu, Y.H.; Agbesi, V.K.; Feroze, W.; Akmel, F.; Assefa, J.M.; Shahzad, A. Graph-to-Text Generation with Bidirectional Dual Cross-Attention and Concatenation. Mathematics 2025, 13, 935. https://doi.org/10.3390/math13060935

AMA Style

Jimale EL, Chen W, Al-antari MA, Gu YH, Agbesi VK, Feroze W, Akmel F, Assefa JM, Shahzad A. Graph-to-Text Generation with Bidirectional Dual Cross-Attention and Concatenation. Mathematics. 2025; 13(6):935. https://doi.org/10.3390/math13060935

Chicago/Turabian Style

Jimale, Elias Lemuye, Wenyu Chen, Mugahed A. Al-antari, Yeong Hyeon Gu, Victor Kwaku Agbesi, Wasif Feroze, Feidu Akmel, Juhar Mohammed Assefa, and Ali Shahzad. 2025. "Graph-to-Text Generation with Bidirectional Dual Cross-Attention and Concatenation" Mathematics 13, no. 6: 935. https://doi.org/10.3390/math13060935

APA Style

Jimale, E. L., Chen, W., Al-antari, M. A., Gu, Y. H., Agbesi, V. K., Feroze, W., Akmel, F., Assefa, J. M., & Shahzad, A. (2025). Graph-to-Text Generation with Bidirectional Dual Cross-Attention and Concatenation. Mathematics, 13(6), 935. https://doi.org/10.3390/math13060935

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop