A Knowledge and Semantic Fusion Method for Automatic Geometry Problem Understanding

Wang, Ying; Zhou, Wei; Rao, Yongsheng; Guan, Hao

doi:10.3390/app15073857

Open AccessArticle

A Knowledge and Semantic Fusion Method for Automatic Geometry Problem Understanding

¹

School of Cyberspace Security, Software Engineering Institute of Guangzhou, Guangzhou 510006, China

²

Institute of Computing Science and Technology, Guangzhou University, Guangzhou 510006, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(7), 3857; https://doi.org/10.3390/app15073857

Submission received: 10 February 2025 / Revised: 17 March 2025 / Accepted: 26 March 2025 / Published: 1 April 2025

(This article belongs to the Special Issue Knowledge and Data Engineering)

Download

Browse Figures

Versions Notes

Abstract

:

Geometry problem understanding (GPU) is a fundamental task in machine intelligence for problem-solving, requiring more accurate and complete information extraction than general natural language understanding tasks. This paper proposes a knowledge and semantic fusion method to achieve high-quality, interpretable, and scalable GPU. It extracts text-level and knowledge-level entities and relationships from problem texts and transforms them into a semantic knowledge graph. First, a dual-layer semantic-enhanced knowledge ontology model (SGKO) tailored for the geometry domain is constructed. By separating the ontology and data layers and combining the strengths of both the knowledge system type ontology and the semantic network type ontology, it enables bidirectional association between conceptual-level knowledge and object-level textual data. Second, a dynamically generated modular relationship matching template is introduced, which is decomposed into reusable atomic components and dynamically assembled through knowledge base queries, significantly reducing template quantity while enhancing adaptability to complex text structures. Additionally, a state-machine-based semantic information extraction model (IDIM-T) is designed that achieves efficient and interpretable semantic extraction through categorized relationship description types. This is combined with a rule-based method (IDIM-K) to complete knowledge-level entity relationship extraction. To validate the method, a dataset was constructed from authoritative sources, including past middle school exam questions, textbooks, and exercise books, covering unary, binary, and ternary relationships, as well as single-clause, cross-clause, and multi-relationship conjunction expressions. Experiments on 230 problems with complex relational descriptions showed that the proposed method achieved fully accurate two-level relationship parsing for 91.87% of the problems. Compared with four baseline methods (sentence template-based, Bi-LSTM-based, Transformer-based, and

S^{2}

-based), the method achieved the highest F1 score (0.974) for 1832 relationships, outperforming the highest F1 score (0.900) of the baselines.

Keywords:

geometry problem understanding; semantic information extraction; knowledge graph; ontology; knowledge-guided entity and relationship extraction

1. Introduction

In the field of geometry problem understanding (GPU), researchers aim to transform geometry problems expressed in natural language into computer-processable representations, which is essential for tasks such as automated geometry reasoning, problem solving, and diagram drawing, serving as the foundation and key step for geometry mechanization [1,2,3]. However, most research focuses on automated reasoning, with less attention to GPU automation [4]. Furthermore, despite recent advancements, existing studies exhibit limitations in task interpretation and methodological applicability.

Current approaches often treat the GPU as an automatic formalization task, identifying geometry elements and relationships to convert them into formal representations. However, these methods are more of a “knowledge representation translation” rather than “text semantic understanding”, missing semantic layers of the problem text. Existing methods simplify entities to geometry elements and relationships to geometry predicates, ignoring crucial components such as numerical values and expressions and missing broader semantic connections. Thus, although these research methods have achieved the goals of GPU to some extent, they are considered overly simplified and lacking a thorough analysis of the requirements of GPU.

GPU uniquely intersects geometry domain knowledge and semantic information extraction (SIE). Unlike conventional SIE tasks, GPU deals with dual information levels: text-level (directly mentioned entities and relationships) and knowledge-level (implicit semantic relationships necessary for holistic semantic understanding). For example, understanding that “D” in “D is a point on BC and E is the midpoint of AD” refer to the same entity is key knowledge-level information that is often overlooked. Figure 1 illustrates three cases where knowledge-level information is overlooked. Ignoring such knowledge-level information results in fragmented and incoherent data. To address this, the introduction of knowledge graph technology is advocated, and GPU is redefined as a knowledge extraction task oriented toward knowledge graphs. The goal is to transform the semantic information of a problem from unstructured text into a structured knowledge graph, using its strong capabilities on knowledge organization, semantic representation and downstream applications.

However, knowledge graphs specific to GPU face limitations due to the duality of GPU information, leading to granularity differences in data construction. This paper distinguishes between two types of knowledge graph models: knowledge systems and semantic networks. The former, essentially a knowledge model, represents information as a knowledge category structure (e.g., [5,6,7,8]), meeting GPU’s concept-level knowledge needs but not the object-level granularity required for semantic knowledge graphs. The latter, semantic-network-type knowledge graphs (e.g., [9,10,11,12]), focus on relationships between pieces of information, effectively representing object-level granularity at the text level but failing to link to knowledge-level information. Both types have limitations and cannot simultaneously meet text and knowledge-level needs, posing challenges for knowledge-graph-oriented GPU. Additionally, the integration of geometry and knowledge engineering is insufficient. Unlike fields such as biology and medicine [13,14], geography [15,16], and cultural heritage [17], which have relatively complete, publicly available knowledge engineering ontologies, the lack of such frameworks in geometry significantly hinders the application of knowledge graph technology in GPU and the integration of knowledge with intelligent research methods. To overcome these challenges, this paper introduces three improvements: First, decoupling the ontology and data layers in the knowledge graph and separating the ontology layer into a reusable ontology model, while storing the specific problem’s semantic data in the semantic layer. Second, constructing the ontology model as a two-layer architecture, where the upper layer adopts a knowledge-system-type ontology for building a geometry domain knowledge base, and the lower layer employs a semantic-network-type ontology as a bridge between conceptual-level knowledge and object-level text data. Third, the knowledge base is structured around the domain knowledge system, with adjustments for semantic extraction needs to enable knowledge-powered entity relationship extraction.

Current GPU methodologies fall into two categories: machine learning (ML)-based and rule/template-based. While ML methods excel in general natural language processing (NLP) tasks, their need for large datasets and computational resources makes them unsuitable for lightweight applications. ML models also suffer from inherent biases, limited interpretability, and difficulties in handling dual-level (text and knowledge) information in resource-constrained scenarios, thus failing to meet GPU’s high accuracy requirements. Traditional rule/template methods mainly focus on sentence-level templates, leading to a large number and complexity of templates, with limited semantic coverage and poor flexibility.

This paper proposes an innovative knowledge and knowledge–semantic fusion method for Chinese middle-school plane geometry proof problems. The model mimics human problem-understanding cognition by enabling knowledge-guided dual-level information extraction. Specifically, this paper constructs a specialized knowledge base called SGKO, integrating geometry ontology with semantic considerations using knowledge graph technology. With SGKO, the model processes problem text, first performing text-level entity and relation extraction (ERE) using a template-matching-based state machine model IDIM-T, converting unstructured text into structured entities and relationships. Then, the rule-based method IDIM-K conducts knowledge-level ERE, ultimately generating the problem’s semantic knowledge graph (abbreviated as ‘problem semantic KG’).

The key contributions of this study include the following.

Global shared dual-layer semantical-enhanced knowledge ontology model (SGKO): By decoupling the ontology layer from the data layer and further constructing the ontology layer as a dual-layer knowledge model for the geometry domain, which consists of an upper-layer geometry domain knowledge and a lower-layer semantic enhancement layer and serves as the common ontology for all problem semantic graphs, this not only resolves the issue of incompatible data granularity in single-layer ontologies for GPU knowledge graphs but also bridges conceptual and object-level information, enabling the integration of problem semantic knowledge graphs with geometry knowledge, but also establishes a geometry domain knowledge model with excellent scalability and maintainability.
Dynamically generated modular relationship matching templates and template matching strategies: Starting from atomic relationship units, this paper modularizes and componentizes semantic roles, forming a semantic role framework that records the relative positional information between roles, which consists of argument slots and predicate slots. During relationship extraction, the semantic role framework dynamically assembles into a suitable template framework by querying the knowledge base and, after knowledge injection, is instantiated as usable matching templates. Additionally, by summarizing and categorizing relationship descriptions in problem texts, reusable template matching strategies are designed for different relationship expression types, enabling automatic and dynamic adaptation based on text structure. This method effectively overcomes the shortcomings of machine learning methods, such as strong data dependency, poor interpretability, and limited scalability, as well as the drawbacks of traditional rule/template-based methods, including the overwhelming number and complexity of templates with persistently low recall rates and difficulty handling complex text structures, reducing the complexity from exponential levels to linear or logarithmic levels.
Knowledge-guided GPU: During entity relationship extraction, dynamic interaction with SGKO is performed, simulating the human understanding process and achieving problem parsing through structured knowledge integration.

The remainder of this paper is organized as follows: Section 2 reviews related work and introduces the innovations of this paper. Section 3 presents the overall methodology. Section 4 details SGKO. Section 5 describes text- and knowledge-level relationship extraction. Section 6 concludes the paper.

2. Related Work

Unlike conventional natural language processing tasks that focus on open-domain semantic generalization and multi-scenario adaptability, geometry problem parsing prioritizes higher accuracy as its core objective. The essence of text-oriented geometry problem understanding lies in mapping key semantic elements from unstructured problem texts to computable semantic representations. This understanding serves as a fundamental step for geometry problem-solving and reasoning. Additionally, some studies have explored the multimodal processing of diagrams and text. This paper focuses on the text-parsing aspects of these studies, reviewing related work from both machine-learning-based and rule/template-based approaches.

2.1. Machine-Learning-Based Methods

In recent years, machine learning methods have been widely applied in geometry intelligence and can be further categorized into large-scale deep learning methods and lightweight machine learning methods.

With advancements in computational power and massive data accumulation, large language models (LLMs), BERT [18], and Transformer [19] have demonstrated strong capabilities in various text-parsing tasks, and these models are gradually being adopted in the geometry domain. AlphaGeometry employs a Transformer-based language model for text processing [20], while the upgraded AlphaGeometry2 combines a Gemini-architecture neural language model with symbolic reasoning engines [21,22]. FGeo-Parser utilizes a T5 model to convert problem texts into text CDL and target CDL, optimized via enhanced training sets and cross-entropy loss [23,24]. UniGeo constructs a multimodal encoder based on the VL-T5 model [25,26]. PGPSNet uses a masked language model (Masked LM) task structure and semantic pretraining method to handle the text modality [27], while Jian et al. proposed an end-to-end deep learning method that uses feature learning to extract the features of geometry problem text [28]. Hu et al. utilized the BERT model and domain knowledge graphs to convert entities and relationships in problem text into equivalent equation representations [29]. MagicGeo leverages the zero-shot capabilities of LLMs for geometry text formalization [30]. TongGeometry builds an Olympiad-level problem solver using two fine-tuned LLMs [31]. These studies employ highly complex deep learning models with millions to billions of parameters. These models heavily depend on data, requiring training data ranging from millions to billions of samples. The computational resource demands during training and inference are high, limiting the applicability and promotion of lightweight methods.

In contrast, lightweight machine learning methods typically use fewer parameters and smaller datasets, enabling effective learning under resource-constrained conditions. For example, GeoQA constructs a text encoder using a single-layer unidirectional LSTM [32,33], while upgraded GeoQA+ combines RoBERTa and Bi-LSTM to address varying text lengths [34,35]. Zhou et al. employed BiLSTM-CRF with triple-level normalization and template mapping to convert directly mentioned information in problem text into knowledge graph representations [36,37]. Chung et al. proposed a knowledge-guided Sequence-to-General Tree (S2G) model, which builds an encoder using GRU [38,39]. Iordan proposed a stacked long short-term memory network (Stacked LSTM) method to parse 3D analytic geometry problem texts [40,41]. Xiao et al. adopted LSTM with RoBERTa as a text encoder [42]. Gan et al. proposed a supervised learning method based on feature extraction and an SVM classifier to extract geometry relationships from problem texts [43]. Seo et al. used a discriminative model to identify relationships in texts [2]. Lightweight machine learning methods face performance bottlenecks, including the strong dependence on manual feature engineering, leading to coverage gaps and long-tail issues, limited contextual awareness, and poor performance in handling scenarios with nested, hidden, and large gaps in information.

From the perspective of technology transfer applications, works like asGeoRE and MLLMs validate that models performing well on conventional general-domain datasets suffer a significant performance drop when applied to the geometry domain, and their experiments also indicate the challenges of adapting machine learning models to the geometry field [44,45]. Moreover, both large-scale deep learning and lightweight models face common challenges in GPU: first, the inherent bias of neural network probabilistic outputs conflicts with the requirement for precise semantic parsing in geometry problem understanding. Second, heavy reliance on data and computational resources hinders lightweight deployment. Third, there is lack of interpretability and poor scalability—new entities/relationships require full model retraining. Fourth, there is a flaw in handling dual information layers, making it difficult to model both text-level explicit relationships and knowledge-level implicit logic simultaneously.

Moreover, although end-to-end deep learning models based on neural networks perform well in single tasks (such as direct problem-solving or direct relationship extraction), when tasks are interdependent, for example, when intermediate results (such as entity relationships) need to drive downstream tasks (such as intelligent drawing/problem-solving), the system faces the dual dilemma of module separation and optimization conflicts. Developers are forced to choose between two suboptimal options: dealing with error propagation and system redundancy caused by independent module chains or facing the implicit coupling of intermediate layers in end-to-end black-box models and optimization conflicts across tasks, leading to a Pareto suboptimal solution.

2.2. Rule/Template-Based Methods

Rule/template-based methods extract entities and relationships through predefined pattern-matching rules, offering high precision and interpretability.

Inter-GPS uses regular expression-based rule matching to convert problem text into symbolic logic expressions [46]; GeoDRL follows a similar approach to build its semantic parser for formalizing problem texts [47]. GeometryNet identifies geometry entities using POS tagging and rule-based methods [48]. Liu et al. introduced a geometric knowledge ontology that combines pattern-matching algorithms to extract relationships from geometric propositions [49]. Wong et al. [50] and Guo et al. [51] established sentence-level matching templates for recognizing relationships in problem text. Despite the high accuracy of this kind of method, their design logic leads to three significant challenges: First, independent templates are needed for each linguistic variant of the same relationship (e.g., “AB‖CD”, “CD is parallel to AB”, and “AB and CD are parallel” require separate templates). Second, for multi-relationship descriptions (e.g., “Point M lies on the perpendicular bisector CG of BD, and GE intersects AC and DM at points P and Q”), not only is it necessary to think about combinations of atomic relationships, but also the handling of entity sharing and nested structures, leading to exponential growth in template numbers. Third, relationships spanning multiple clauses (e.g., “In

▵ A B C

, …, altitude AD…”) often go undetected.

Gan et al. proposed the S2 model [52,53,54], simplifying matching objects into relationship keywords, entity types, and quantities to optimize traditional templates. Specifically, the method first uses NLP tools (ICTCLAS) for word segmentation, POS tagging, and sentence boundary detection. Then, it uses a classification-based entity recognition approach to extract geometry elements and relationship words. Third, the S2 model matches relationship words and element types to generate formalized atomic propositions. The S2 model has been applied in many studies for text relationship extraction. For example, Yu et al. used the S2 model to understand texts related to solid geometry calculation problems [55]. Jian et al. used the S2 model to extract explicit relationships from circuit problems in secondary physics and proposed a unit-theorem-based method to handle implicit relationships [56], while Yu et al. applied the S2 model to arithmetic word problems to handle explicit relationships [57]. Some studies have improved the S2 model. For example, He et al. expanded the S2 model pool to recognize more types of relationships [58]. Yu et al. proposed a relaxed syntax–semantics method to optimize the original study’s limitation of matching only one S2 model per paragraph [59]. Lyu et al. improved the original text-based matching mechanism by proposing a vectorized-S2 (VM-S2) model based on BERT to extract quantity relationships from arithmetic word problems [60], and Huang et al. used VM-S2 to extract basic geometry relationships from algebraic problems [61]. Although the S2 model effectively solves the first issue in traditional methods, the latter two still remain unresolved. While the relaxed-S2 method permits repeated calls to the S2 model, this does not fully resolve the relationship combination issue. Additionally, this method faces problems in capturing semantic roles, which leads to relationship extraction errors. For instance, failing to distinguish between “AB is the perpendicular bisector of CD” and “AB’s perpendicular bisector is CD” results in logical direction errors that affect the accuracy of relationship extraction.

To address these issues, this paper proposes a new method that starts from limited atomic relationship units and decouples semantic roles and records their relative positional information to form a freely combinable Relationship Semantic Role Framework (RSF). By anchoring keywords to relationship types and combining knowledge base queries, relevant knowledge is dynamically injected into the RSF to generate matching templates. A set of reusable template matching strategies is designed to enable dynamic combination of templates for complex text structure entity and relationship extraction algorithms. This method achieves a transition from exhaustive sentence-level matching to the on-demand assembly of atomic components, significantly reducing the number of templates, improving the ability to handle complex relationship expressions, and enhancing generalization. Furthermore, all knowledge is categorized, hierarchical, and organized in a tree structure, unified for management and maintenance in the SGKO knowledge base. Entity and relationship types are modularly managed, so new entity types or relationships can be added to the corresponding knowledge hierarchy without restructuring existing templates or computational frameworks, enabling plug-and-play expansion to accommodate future knowledge evolution and model expansion.

3. Overview of the Research Framework

The research approach is outlined in Figure 2, presenting the entire model framework on the left, which includes two main components: a Semantic-Enhanced Geometry Knowledge Ontology (SGKO) model and an Information Detection and Interpretation Module (IDIM).

SGKO acts as the knowledge hub, serving both as an ontology model and a knowledge base. It decouples the ontology layer from the data layer, serving as the common ontology for problem semantic graphs. The upper layer of SGKO aligns with a knowledge model-type knowledge graph, tailored for the geometry domain and text processing, while the lower layer incorporates principles from semantic-network-type knowledge graphs to enrich the upper layer’s concepts with semantic knowledge.

IDIM comprises IDIM-Text (IDIM-T) for text-level and IDIM-Knowledge (IDIM-K) for knowledge-level entity relationship extraction, operating in two distinct stages. IDIM-T uses template matching to extract entity relationships from unstructured text, dynamically interacting with SGKO to adapt its processing based on the current text. IDIM-K builds on IDIM-T’s output to perform structured data-oriented ERE at the knowledge level, constructing knowledge-level relationships according to the structured knowledge system in SGKO.

The GPU process is structured into three modules: data preprocessing for initial cleaning, information extraction where IDIM-T and IDIM-K sequentially process text to extract entities and relationships, and knowledge graph construction, which synthesizes the outputs into a comprehensive problem-semantic KG.

4. SGKO: Semantic-Enhanced Geometry Knowledge Ontology

4.1. The Upper Knowledge System Layer

According to the dataset resources, “EntityType” and “RelationshipType” are designated as top-level classes in the knowledge system, categorizing various entity and relationship types. Entity types are grouped into Geometry Shape, Expression, and Value. Geometry Shape includes subclasses like Point, Line, Angle, and Circle, with Polygons further subdivided into Triangle and Quadrilateral. Triangle is classified into Isosceles, Equilateral, and Right Triangle, while Quadrilateral includes Square, Parallelogram, Rhombus, and Trapezoid. Trapezoid is further detailed into Isoscele, Right, and Isoscele Right Trapezoid. Values distinguish between Numerical Value (e.g., 4, 3.2, 1/2) and Degree Value (e.g., 18°), while Expression is split into Operation Expression (e.g., 1/2(AB + CD),

{2AB}^{2}

+

{CD}^{2}

) and Non-operation Expression (e.g., 2AB,

1 / 4

CF).

To enhance text-level ERE, a new entity type “ShapeLimit” and a corresponding relationship type “shapeLimits” are introduced to describe specific polygon prefixes like “isosceles” and “equilateral”, treated as independent ShapeLimit entities. The relationship type “ShapeLimits” defines constraints these prefixes impose on polygons, facilitating combinations of ShapeLimit, PolygonType, and Polygon entities in logical sequences of “special shape description prefix→polygon type→polygon name”, as illustrated in Table 1.

This approach reduces entity types and may enhance GPU accuracy by converting unary semantic relationships into binary relationships for more streamlined processing. Descriptions such as “isosceles triangle” and “right trapezoid” can be treated either as independent entities or combined with the Geometry Shape entity, impacting GPU accuracy when increasing the Bayesian denominator. The introduction of the ShapeLimit and shapeLimits relationship types aligns with the Subject–Predicate–Object (SPO) structure, facilitating uniform processing. This allows subclasses under ShapeLimit and Polygon categories to be processed separately, simplifying entity management while distinguishing between conceptual- and object-level information. Conceptual-level entities like ShapeLimit and Polygon categories are retained within the SGKO, whereas specific polygon names are processed and included in the problem semantic KGs.

Relationship types are classified under “Text-level RelationshipType” and “Knowledge-level RelationshipType” in our knowledge system, differentiating relationships relevant to the text-level ERE and those pertinent to the knowledge-level ERE without overlap in their processing. Text-level relationships focus on geometry predicates, while knowledge-level relationships pertain to arithmetic operations.

For text-level relationship types, two categories are defined:

Definition 1 (Autonomous Relationship Type and Dependent Relationship Type).

Let

U_{R T}

represent the set of all text-level relationship types.

R T_{A u t o n o m o u s}

and

R T_{D e p e n d e n t}

denote the sets of Autonomous Relationship Type and Dependent Relationship Type, respectively. For any

r t \in U_{R T}

, if its relationship has a complete SPO structure, which independently conveys complete semantics, then

r t \in R T_{A u t o n o m o u s}

; otherwise,

r t \in R T_{D e p e n d e n t}

. For any

r t_{d e p e n d e n t} \in R T_{D e p e n d e n t}

, it is associated with a specific

r t_{a u t o n o m o u s} \in R T_{A u t o n o m o u s}

.

For example, Parallel, Perpendicular, and Midpoint are autonomous relationship types, while Foot and Intersection are dependent relationship types. Foot is specifically associated with Perpendicular, and Intersection is specifically associated with Intersects.

Based on this, “Text-level Relationship Type” is divided into “Binary Relationship type” and “Ternary Relationship type”.

Definition 2 (Binary Relationship type and Ternary Relationship type).

Let

R T_{B i n a r y}

and

R T_{T e r n a r y}

represent the sets of Binary Relationship Type and Ternary Relationship Type, respectively. For any

r t \in R T_{A u t o n o m o u s}

, if

r t

does not have any

r t \in R T_{D e p e n d e n t}

relationship type, then

r t \in R T_{B i n a r y}

and is called a binary relationship type; otherwise,

r t \in R T_{T e r n a r y}

and is termed a ternary relationship type.

For example, Parallel, Midpoint, and Similar are binary relationship types, while Perpendicular and Intersect are classified as ternary relationship types.

Subsequently, further adjustments were made for the knowledge-level issue that lower-dimensional geometry entities are the “parts” areand higher-dimensional geometry entities are the “whole”. Therefore, for entity types, the classification structure under Geometry Shape is adjusted to include distinctions between 0-dimensional, 1-dimensional, and 2-dimensional levels. For relationship types, a new relationship type “Constructs” is added to represent the construction relationships between lower-dimensional geometry elements and higher-dimensional geometry elements.

Thus, a complete classification system for the GPU knowledge system is formed, as shown in Figure 3.

4.2. The Lower Semantic Knowledge Layer

The lower layer’s work involves defining semantic knowledge. Semantic knowledge is customized for each entity type and relationship type, enabling machines to recognize these concepts and identify their mentions. Semantic knowledge is defined as attributes within each concept.

Semantic knowledge is closely related to the structured representation of an entity and a relationship, which are introduced below.

The knowledge of an entity includes all its expressions and variants, which are represented using regular expression rules, as well as its demanded structured representation.

Definition 3 (Structured Representation of Entity).

An entity is structured as an ordered pair

e = (e n a m e, e t)

, where

e n a m e

represents the entity’s name, and

e t

represents its type.

Relationships are divided into “Binary Relationships” and “Ternary Relationships” based on their relationship types.

Definition 4 (Binary Relationship).

A relationship is a binary relationship if its relationship type

r t \in R T_{B i n a r y s}

.

A binary relationship is formally represented as an ordered triple

r_{b i n a r y} = (e_{s u b j e c t}, r t, e_{o b j e c t}),

where

e_{s u b j e c t}

and

e_{o b j e c t}

, respectively, represent the head and tail entities of

r_{b i n a r y}

. There exists an

r t \in R T_{B i n a r y s}

relationship between them. This structure is consistent with the SPO triple structure, and the relationship is directed.

Definition 5 (Ternary Relationship).

A relationship is a ternary relationship if its relationship type

r t \in R T_{T e r n a r y}

. A ternary relationship is regarded as consisting of two parts, a core part and an attached part, termed as a Core Relationship and Attached Relationship, respectively.

A ternary relationship is formally represented as a pair of triples:

r_{t e r n a r y} = \{\begin{matrix} r_{c o r e} = (e_{s u b j e c t}, r t_{c o r e}, e_{o b j e c t}), \\ r_{a t t a c h e d} = (r_{c o r e}, r t_{a t t a c h e d}, e_{a d d i t i o n a l}) . \end{matrix}

In this structure,

r_{c o r e}

represents the core part of

r_{t e r n a r y}

, including the head entity

e_{s u b j e c t}

and the tail entity

e_{o b j e c t}

, between which there exists an

r t_{c o r e} \in R T_{T e r n a r y}

relationship.

r_{a t t a c h e d}

is the attached part of

r_{t e r n a r y}

, indicating that there is an

r t_{a t t a c h e d}

relationship with

e_{a d d i t i o n a l}

, where

r t_{a t t a c h e d} \in R T_{D e p e n d e n t}

. Both structures represent directed relationships. There is a mutually inferable association between

r t_{c o r e}

and

r t_{a t t a c h e d}

.

For example, “AB⊥CD” describes a binary relationship structured as

r_{b i n a r y} = ((“ A B ”, S e g m e n t), p e r p e n d i c u l a r, (“ C D ”, S e g m e n t))

, while “AB⊥CD at point P” describes a ternary relationship, structured as two triples:

r_{c o r e} = ((“ A B ”, S e g m e n t), p e r p e n d i c u l a r, (“ C D ”, S e g m e n t))

and

r_{a t t a c h e d} = (r_{c o r e}, f o o t, (“ P ”, P o i n t)) .

Here, perpendicular and foot are mutually inferable.

It is worth noting that there is an intersection between the concepts of

r_{c o r e}

and

r_{t e r n a r y}

.

r_{t e r n a r y}

is used in a relatively macro context, such as when discussing or emphasizing the distinction between

R T_{B i n a r y}

and

R T_{T e r n a r y}

, whereas

r_{c o r e}

is used in a more micro context, specifically when discussing the core relationship type and attached relationship type of a ternary relationship. The same applies to

r t_{c o r e}

and

r t_{t e r n a r y}

.

Semantic knowledge is generally divided into “Textual Attribute” and “Domain Knowledge Attribute”.

For entity type, defining semantic knowledge involves specifying their “Textual Attribute”, which encompasses all possible descriptions for each entity type. For example, the textual feature of the Point is represented by a single uppercase letter, the Segment is typically represented by two consecutive uppercase letters, and the Degree Value is represented by a sequence of digits ending with “°”. Textual Attributes are set in the form of regular expressions, and the Textual Attributes of subordinate classes are aggregated upwards to their respective superordinate classes.

For each relationship type, Textual Attribute includes “Relationship Type Keyword” and “Relationship Semantic Role Framework”.

Definition 6 (Relationship Type Keyword, Keyword).

Relationship Type Keyword refers to those symbols, characters, or words that are used every time this type of relationship is mentioned.

Let

K W

represent the set of keywords. Keywords are divided into autonomous keywords and dependent keywords, formally expressed as

K W = K W_{A u t o n o m o u s} \cup K W_{D e p e n d e n t}

. Autonomous keywords have the ability to directly indicate the relationship type, independently identifying a type of relationship, and having a one-to-one correspondence with the relationship type. Dependent keywords are primarily used to indicate the existence of attached relationships but do not necessarily directly indicate the relationship type and do not always have a one-to-one correspondence with the relationship type. In relationship descriptions, these keywords appear together with autonomous keywords of the same relationship type to fully describe a relationship.

For any

r t \in R T_{B i n a r y s}

, let

K W_{r t}

be the set of all its keywords; then, for any

k w \in K W_{r t}

,

k w \in K W_{A u t o n o m o u s}

.

For any

r t \in R T_{T e r n a r y}

, let

K W_{r t}

be the set of all its keywords,

K W_{r t} = K W_{r t}^{C o r e} \cup K W_{r t}^{A t t a c h e d}

, where

K W_{r t}^{C o r e}

refers to the core keywords of

r t

and

K W_{r t}^{A t t a c h e d}

refers to the attached keywords of

r t

. For all

k w \in K W_{r t}^{C o r e}

,

k w \in K W_{A u t o n o m o u s}

, and for all

k w \in K W_{r t}^{A t t a c h e d}

,

k w_{D e p e n d e n t}

.

For example, the keywords for the binary relationship type Parallel include “‖”, “parallel”, and “parallel line”, all of which are autonomous keywords. The keywords for Midpoint include “midpoint”, which is also an autonomous keyword. Perpendicular, a ternary relationship type, includes core keywords “⊥”, “perpendicular”, “perpendicular line”, which are all autonomous keywords, and attached keywords “foot”, “at”, which are dependent keywords.

Definition 7 (Relationship Semantic Role Framework, RSF).

RSF is a sequence consisting of a series of placeholders used to record the sentence structures of all natural language valid descriptions of specific keywords. RSF refers to the relative positions of arguments and keywords in a relationship’s description by arranging these placeholders in a specific order, enabling the relationship description to be viewed as an instance of the RSF. The placeholders include S, O, A, and K, representing the subject, object, additional entity, and keyword, respectively.

For any

r t \in R T_{B i n a r y}

, the RSF of any

k w \in K W_{r t}

consists of only S, O, and K placeholders. For any

r t \in R T_{T e r n a r y}

, the RSF for the

k w \in K W_{r t}^{C o r e}

consists of S, O, and K, while the RSF for the

k w \in K W_{r t}^{A t t a c h e d}

consists of A and K.

Using Parallel as an example of a binary relationship type and Intersect as an example of a ternary relationship type, their Textual Attributes are displayed in Figure 4 and Figure 5. These RSFs (leaf nodes) will be used as templates for relationship matching in the following sections, and their applicable relationship descriptions are also shown.

Let all

R S F

s under the keyword

k w

be represented as

R S F_{k w}

.

R S F_{k w}

is ordered, with each

r s f \in R S F_{k w}

sorted by commonality from highest to lowest. These

R S F

s serve as frames for relationship matching templates in the Text-level ERE process, and the order of

R S F_{k w}

is designed to quickly hit the target template.

“Domain Knowledge Attribute” for relationship types populates the potential entity types of the entities linked by a relationship, serving as primary geometry domain knowledge. These potential entity types are recorded at the level of the superordinate class. Table 2 illustrates the Domain Knowledge Attribute for Parallel, Height, and Perpendicular.

Using Perpendicular as an example, its complete knowledge storage in the lower layer is shown in Figure 6.

Such a dual-layer ontology model not only forms a knowledge base, providing modular storage and management of domain knowledge but also bridges the gap between text-level and knowledge-level information and data. It serves as a powerful tool for entity–relation extraction that supports dual-layer information needs and is oriented toward knowledge graph-based GPU.

The current implementation of SGKO incorporates 10 entity types and 13 relationship types for text-level ERE. As shown in Table 3, each relationship type is predefined with its associated entity types, keywords and RSFs. For instance, Parallel corresponds to 1 entity type, 2 keywords, and total 2 RSFs to model its syntactic variations. Through modular and component-based design, SGKO achieves the efficient reuse of knowledge elements: while only 10 entity types and 8 RSFs are predefined (see footnote 4), they are dynamically invoked across multiple relationship types, resulting in 31 entity-type calls and 28 RSF calls in total. This design minimizes redundancy—for example, the Point entity type is reused in relationships like Intersects or Foot or PointOnLine, whereas the PerpendicularBisector is called by using existing Perpendicular and Bisector —while maintaining flexibility to handle diverse geometry expressions.

4.3. Model Scalability and Adaptability

The SGKO model demonstrates good scalability, mainly due to two aspects: tree-structured modular management and atomic, modular, and componentized design.

On the one hand, all entity and relationship types are categorized into corresponding branches through a tree structure, so when new types are added, they only need to be incorporated by adding nodes to the corresponding branches, without the need to restructure existing templates or algorithm frameworks. This design supports “plug-and-play” expansion, easily accommodating future knowledge evolution and model extension needs.

For example, when extending the processing capability for the Circumcenter relationship, traditional rule/template-based methods and machine learning approaches incur tremendous updating costs. Specifically, traditional rule/template methods require designing and constructing an explosive number of rules or templates. Not only must the matching rule/template design for atomic relationships be considered (e.g., “[Point] is the circumcenter of [Triangle]” and “[Triangle]’s circumcenter (is) [Point]”), but the rule design for the conjunction of multiple relationships must also be addressed, leading to a vertical growth in the number of rules, such as the various conjunctive expressions involving the Circumcenter and PointOnLine relationships (e.g., “[Triangle]’s circumcenter [Point] on [Line]”, “passing through [Triangle]’s circumcenter [Point] to draw [Line]”, “[Line] passing through [Triangle]’s circumcenter [Point]”), as well as horizontal expansion by considering possible combinations with various other relationship types and their different mention orders. For instance, beyond “circumcenter” and “point on line”, additional relationships such as Parallel (e.g., “through [Triangle]’s circumcenter [Point] draw [Line]‖[Line]”) and Intersects (e.g., “through [Triangle]’s circumcenter [Point] draw [Line] intersects [Line] at [Point]”) must also be incorporated, requiring rules/templates for various expression and orders (e.g., parallel–intersection or intersection–parallel).

Machine-learning-based methods exhibit limitations in corpus collection and model adjustments. Specifically, relevant samples must be actively searched (e.g., collecting and selecting problems containing circumcenter relationships to construct a dataset), and retraining or fine-tuning already trained the model may face issues with imbalanced sample distribution or adversely affect the recognition of other already processed types, leading to significant expansion costs.

In comparison, the proposed method only requires introducing a new subtree for Circumcenter in SGKO and filling in predefined knowledge. This includes defining the possible entity types (Triangle and Point), setting the relationship keyword (“circumcenter”), and selecting appropriate RSFs from the RSF library (e.g., “

e_{s u b j e c t}

-

e_{o b j e c t}

-kw” and “

e_{o b j e c t}

-kw-

e_{s u b j e c t}

”). The expansion process requires neither extensive nor cumbersome work, nor does it depend on data, achieving an almost zero-cost and zero-impact expansion. A detailed comparison of expansion costs is provided in Table 4.

On the other hand, the design philosophy of atomizing, modularizing, and componentizing the management, maintenance, and application of knowledge also facilitates adaptation to other tasks within this domain, such as adding “properties” to polygons (e.g., “opposite sides are parallel” for Parallelogram) in automatic reasoning tasks. Furthermore, this design can be extended beyond geometry to other domains, ensuring broad applicability and scalability.

5. Information Detect and Interpret Model

5.1. Text-Level ERE and IDIM-T

This section outlines the text-level ERE methodology based on the IDIM-T model, which processes cleaned problem sentence into clauses for extraction. Using autonomous keywords as processing anchors, IDIM-T functions as a state machine, categorizing relationship mentions, extracting entities, and processing relationships.

5.1.1. Relationship Mention Types

This paper classifies the types of relationship mentions in geometry problem descriptions to facilitate targeted relationship extraction.

Definition 8 (Relationship Description).

A relationship description refers to any textual representation of a relationship, i.e., a textual instantiation of a relationship. Denote a relationship description as

r d

.

The relationship description of a binary relationship is called a binary relationship description. Let

R D_{B i n a r y}

represent the set of relationship descriptions for a binary relationship

r_{b i n a r y}

. For all

r d_{b i n a r y} \in R D_{B i n a r y}

, it contains a

k w \in K W_{A u t o n o m o u s}

, as well as mentions of

e_{s u b j e c t}

and

e_{o b j e c t}

. A

r d_{b i n a r y}

constitutes a complete relationship mention.

A relationship description of a ternary relationship is called a ternary relationship description. The set of relationship descriptions for a ternary relationship

r_{t e r n a r y}

can be represented as

R D_{T e r n a r y} = (R D_{C o r e} \times R D_{A t t a c h e d}) \cup R D_{C o r e}

. For all

r d_{c o r e} \in R D_{C o r e}

, it is the relationship description of its

r_{c o r e}

, containing a

k w_{c o r e}

and mentions of

e_{s u b j e c t}

and

e_{o b j e c t}

; for all

r d_{a t t a c h e d} \in R D_{A t t a c h e d}

, it is a relationship description of

r_{a t t a c h e d}

, containing a

k w_{a t t a c h e d}

and a mention of

e_{a d d i t i o n a l}

. A ternary relationship description

r d_{t e r n a r y} \in R D_{T e r n a r y}

can be described in two ways: with and without mentioning

r d_{a t t a c h e d}

.

For example, for a binary relationship where AB is parallel to CD, both “AB‖CD” and “CD is parallel to AB” are its relationship descriptions, and they are both binary relationship descriptions. Similarly, for a ternary relationship where AB is perpendicular to CD with the foot at P, both “AB⊥CD at P” and “AB is perpendicular to CD, P is the foot of the perpendicular” are its complete relationship descriptions that with

r d_{a t t a c h e d}

, while “AB⊥CD” is a relationship description that does not mention

r d_{a t t a c h e d}

.

Definition 9 (Relationship Mention).

A relationship mention refers to a text segment that mentions a relationship. A relationship can be mentioned in the text in any way (individually, concurrently, or otherwise), whether the mentioned relationship is complete or incomplete. All relationship descriptions fall under a type of relationship mention.

The difference between a relationship description and a relationship mention is that the former starts with a specific relationship, while the latter is based on the perspective of the text. For example, “the foot is E” would not be considered a complete relationship description because it is not a complete relationship, but it is a way of mentioning a relationship. Similarly, “point M, N are the midpoints of AC and EF, respectively” is a way of mentioning a relationship and consists of two relationship descriptions.

Geometry problem texts are segmented into clauses C using commas or periods. Each clause

c \in C

is treated as a potential relationship mention. The clauses are categorized into a four-quadrant framework based on the count and completeness of relationships within single or multiple clauses. These categories, known as Relationship Mention Types (RMTs), are illustrated in Figure 7.

(1): Single Relationship, Single Clause (Complete) (SC)

This category includes clauses that clearly articulate a single relationship with all required arguments, categorized into binary (SC-B) and ternary (SC-T) relationships. Examples are shown as e.g.1 and e.g.2 in Figure 7.

(2): Single Relationship, Incomplete Description in a Single Clause (SI)

This type involves clauses mentioning a single, incomplete relationship. For binary relationships (SI-B), an argument may appear in a different, non-adjacent clause, illustrated in e.g.3 of Figure 7. Ternary relationships (SI-T) can exhibit

Attached relationship Isolation. Core and attached relationships span two contiguous clauses (e.g.4, Figure 7).

Subject entity Isolation. The subject entity of the core relationship appears in a previous clause, shared with other relationships (e.g., “BF intersects AD at point E, intersects AC at point F”).

(3): Multiple Relationships, Complete Descriptions in a Single Clause (MC)

This type refers to a clause that mentions multiple relationships with complete descriptions. It is divided into

MC-S. Same relationship type described in parallel argument slots (e.g.5, Figure 7).

MC-D. Different relationship types sharing common entities (e.g.6, Figure 7).

(4): Mixed Type of Multiple Types (MT)

This refers to clauses with multiple relationships, including at least one incomplete description. Relationships with complete descriptions are treated as MC with SC, while those incomplete are considered MC with SI, as shown in e.g.7 and e.g.8 of Figure 7.

The characteristics of each type are shown in the following Table 5 and serve as the basis for IDIM-T to identify RMT.

5.1.2. Entity Extraction

IDIM-T employs both type-specific and type-agnostic methods for entity extraction. Type-specific extraction utilizes SGOK to identify and retrieve entities of a known type,

e t

, based on Textual Attributes. Type-agnostic extraction, meanwhile, leverages SGOK as a comprehensive dictionary to detect and classify any encountered entities.

Entity extraction is performed at the superordinate class level to optimize efficiency, with finer categorization deferred to subsequent knowledge-level processing.

5.1.3. Relationship Extraction

This subsection outlines the modules involved in relationship extraction and describes the basic and specific processes within them. The relationship extraction process begins by taking a clause c and processing it keyword by keyword, outputting intermediate results including sets of relationship triplets

R_{c}

and entity sets

E_{c}

.

Processing Modules

The relationship extraction process involves the following five modules: Knowledge Linking, Relationship Detection, Relationship Matching Template Generation, Relationship Extraction, and Relationship Structured Representation Generation.

(1): Knowledge Linking Module

IDIM-T establishes a connection with SGKO. IDIM-T submits the keyword

k w

to SGKO, which retrieves the relationship type

r t

based on the

k w

and establishes a link between IDIM-T and the

r t

node, allowing IDIM-T to access all knowledge about

r t

. Denote all knowledge of

r t

as

k n o w l e d g e T r e e_{r t}

. Let

r t \leftarrow k w

denote obtaining

r t

based on

k w

.

(2): Relationship Detection Module

Detection of Relationship Existence in a Clause. Let

K W = K W_{A u t o n o m o u s} \cup K W_{D e p e n d e n t}

be the set of all relationship keywords. If the clause c contains any

k w \in K W

, then there is a relationship mention in c, and the ordered list of all keywords in c is obtained as

K W_{c}

.

Detection of Attached Relationship in Ternary Relationships. For c, take the text fragment

c^{'}

from the current

k w

up to the next

k w_{a u t o n o m o u s}

or to the end of the sentence. Detect if there is any

k w^{'} \in K W_{r t_{t e r n a r y}}^{A t t a c h e d}

within

c^{'}

. If

k w^{'}

exists, the clause

c^{'}

contains a mention of an attached relationship; otherwise, c only mentions the core relationship.

(3): Template Matching Module

IDIM-T generates a relationship matching template t based on

k w

, and searches and extracts the arguments of the relationship according to the templates. This process is called template matching.

Relationship Matching Template Generation. IDIM-T indexes the “Core Keyword” in

k n o w l e d g e T r e e_{r t}

to find the

k w

, and then indexes to

R S F_{k w}

. The first unused

r s f

in the ordered

R S F_{k w}

is taken as the framework of the template. Then, IDIM-T fills the template framework with knowledge attributes from

k n o w l e d g e T r e e_{r t}

to form a usable template t. An illustration is shown in Figure 8.

Definition 10 (Relationship Matching Template).

A relationship matching template t uses

R S F

as a frame and is essentially a list of records documenting the relative positional relationships between a series of relationship argument role information and specific keywords

k w *

. Each record is formally represented as

r e c o r d = (a r, e t, p o s) \in t

. Here,

a r \in {s u b j e c t, o b j e c t, a d d i t i o n a l e n t i t y}

represents the type of the argument role of entity e,

e t

represents the entity type of e, which comes from preset knowledge attributes, and

p o s

indicates the relative position between e and

k w *

with the keyword

k w *

as the reference point.

Table 6 uses RSFs of Perpendicular as an example (refer to Figure 7), illustrating the template and its formal representation.

Argument Searching. As for searching an argument entity, IDIM-T, according to

r e c o r d = (a r, e t, p o s) \in t

, starts from the position of

k w

in c and searches in the direction specified by

p o s

for an entity mention of type

e t

. This entity is the one that plays the argument role

a r

in the relationship.

Argument Slot Entity Quantity Detection. When an entity is found that matches the

r e c o r d

, IDIM-T first performs an argument slot entity quantity detection before extracting the entity. There is a preset conjunction table in IDIM-T that contains possible conjunctions. After exploring and finding a matching argument entity according to the

r e c o r d

, an entity boundary check is performed to see if there are conjunctions at the boundaries of the current entity to determine if it is a case of multiple entities mentioned in a single slot.

Argument Entity Extraction. Argument entity extraction operates based on the relationship matching template, distinguishing between single and multiple argument entities.

Single Argument Entity Extraction extracts a single entity mentioned in an argument slot, represented formally by entity mention (EM),

e n a m e \leftarrow E M (a r, e t)

, where

a r

is the argument role,

e t

is the entity type, and

e n a m e

is the entity name. Multi-argument Entity Extraction extracts multiple entities connected by conjunctions within an argument slot, creating an ordered list of entities for each role.

There is an extraction principle that extraction proceeds without crossing autonomous keywords when searching to the right, enhancing focus and minimizing irrelevant data capture, especially under MT-D conditions.

Special Processing for RMT SI series: For MT-D scenarios, argument extraction includes verifying the subject entity against previous relationships. If the entity types match but names differ, adjustments are made to align with the previous relationship’s subject entity. This ensures continuity and accuracy in entity relationships across the process.

Using the example “DE‖AC intersects BH at point F”, IDIM-T accurately extracts

r_{P a r a l l e l}

with the entities (“DE”, Line) and (“AC”, Line) as subject and object, respectively. For

r_{I n t e r s e c t}

, IDIM-T initially misidentifies the subject entity as (“AC”, Line) but extracts the correct object (“BH”, Line) and additional entity (“F”, Point). A verification step adjusts the subject entity from (“AC”, Line) to (“DE”, Line) by identifying the inconsistency and replacing it with the correct entity from a previous relationship. Similarly, in the scenario “through point B draw AC⊥DC”, no corrections are necessary as the entity types (“B”, Point) and (“AC”, Line) from previous and current relationships differ, affirming the accuracy of the extraction process.

Special Processing for RMT SI series: If no matching template is found for

R S F_{k w}

due to a missing subject entity, two scenarios are considered. For specific relationship types with special entity types, such as Triangle, a reverse search is performed in the text-level ERE final entity set

E_{T e x t}

to locate the first matching entity. Alternatively, the default method involves a reverse search in the relationship set

R_{T e x t}

to find the first entity with a subject role that matches the entity type. This approach ensures that missing subject entities are correctly identified and processed.

Template Generation Iteration. IDIM-T iteratively generates templates. If a template successfully matches, the process advances to the next module. Otherwise, IDIM-T sequentially selects an unutilized

r s f

from

R S F_{k w}

, creates a template t, and attempts to match all

r e c o r d s

within t. A successful match moves the process to the next module. If any record fails to match, IDIM-T attempts the next

r s f

. Linked lists append instantiated templates to

r s f

to optimize time and space during further template generation.

(4): Structured Representation Generation Module

Once IDIM-T identifies all entity mentions, it constructs relationship triplets in a structured format. The format includes the entity name

e n a m e

, entity type

e t

, and relationship type

r t

. With

r t

derived from the keyword and

e t

from domain knowledge associated with

r t

, IDIM-T populates the structured representation using predefined templates. This process involves assigning the correct

e n a m e

s to their respective entity mentions to form relationship triplets. The output of this module is a relationship triplet set

R_{c}

and an entity set

E_{c}

.

Special processing for RMT SI-T: For Attached Relationship Isolation, after processing the current c, only

r t_{a t t a c h e d}

and

e_{a d d i t i o n a l}

are obtained. Further processing involves reverse searching in the result set

R_{T e x t}

to find the first triplet r whose

r t = r t_{t e r n a r y}

, which is the corresponding

r t_{c o r e}

for

r t_{a t t a c h e d}

. This r serves as the

r_{c} o r e

to construct a complete

r_{a t t a c h e d}

. If multiple attached relationships are processed for the current c, the corresponding number of relationships with rt as

r t_{c o r e}

are retrieved from the set of relationship triplets and processed in the same 1-to-many manner. This can handle cases such as “AD⊥BC, BE⊥AC, feet respectively are D, E”.

For Subject Entity Isolation, the incompleteness of relationship information affects only

r_{c o r e}

and not

r_{a t t a c h e d}

. Since the subject entity is missing, it is often the subject entity of a previous relationship in the previous clause.

To fill missing subject entities, which are often subjects in prior clauses, the set of relationship triplets

R_{T e x t}

is traversed in reverse. The first subject entity that matches the required type is used to complete the relationship triplet.

For example, in “FA perpendicular bisectors ∠CAE and intersects BE at M, and intersects CE at N”, the latter clause matches this scenario. In the previous iteration, three relationships were extracted,

r_{1}

= ((“FA”, Line), Bisector, (“∠CAE”, Angle)),

r_{2}

= ((“FA”, Line), intersects, (“BE”, Line)), and

r_{3}

= (

r_{2}

, intersection, (“M”, Point)), which are the most recent in the relationship result set

R_{T e x t}

. The missing subject entity in the current clause, (“FA”, Line), is found by searching backward in this set, and then the relationship triplet

r_{c o r e}

= ((“FA”, Line), intersects, (“CE”, Line)) is generated. The

r_{a t t a c h e d}

triplet remains unaffected and is generated as usual.

Special Processing for RMT MC-S: If any argument slot in the entity list contains 1 to m elements, duplicate the single element to m elements, aligning each

R_{a r}

. Then, traverse these lists simultaneously to construct relationship triplets, indexing the same entity elements belonging to the same relationship. For ShapeLimits relationships, retain only the information with a higher degree of strictness as recorded in SGKO. For example, in “The quadrilateral ABCD is a rhombus”,

r_{1}

= ((“Quadrilateral”, PolygonType), shapeLimits, (“ABCD”, Polygon)) and

r_{2}

= ((“Rhombus”, PolygonType), shapeLimits, (“ABCD”, Polygon)). Since Rhombus is stricter than Quadrilateral,

r_{2}

is retained, and

r_{1}

is discarded.

2.: State Determination and Overview of Relationship Extraction Process

IDIM-T determines the structure and type of relationship, whether binary or ternary, involved with the current keyword by assessing the RMT and relationship description type. This evaluation occurs simultaneously in the relationship detection and template matching modules, facilitating a cross-module cascading decision process. IDIM-T sequentially refines the relationship’s structure and type, optimizing the handling of the detected relationships. The pseudocode for the RMT decision process is presented in Algorithm 1.

Algorithm 1 Relationship Mention Type Determination

Input: Clause c with keywords $K W_{c}$
Output: Relationship Mention Type (RMT)
Relationship Types $R T$ in SGKO, Dependencies $K W_{D e p e n d e n t}$ and $K W_{A u t o n o m o u s}$
RelationshipDetection $(c)$
if $| K W_{c} | = 1 and k w \in K W_{D e p e n d e n t}$ then
Determine $r t \leftarrow k w$
if $r t = r t_{t e r n a r y}$ then
$R M T \leftarrow$ SI-T or MC-S
end if
else if $| K W_{c} | = 1 and k w \in K W_{A u t o n o m o u s}$ then
Determine $r t \leftarrow k w$
if $r t = r t_{b i n a r y}$ then
$R M T \leftarrow$ SI-B or MC-S
else if $r t = r t_{t e r n a r y}$ then
Check for attached $k w^{'} \in K W_{r t_{t e r n a r y}}^{A t t a c h e d}$
$R M T \leftarrow$ SC-T or MC-S
end if
else if $| K W_{c} | > 1$ and all $k w \in K W_{A u t o n o m o u s}$ then
$R M T \leftarrow$ MC-D or MT
end if
TemplateMatching $(c, K W_{c}, R T)$
for each $t \in T$ do
if t matches and arguments in all slots $= 1$ then
Invoke determination procedures for all non-MC-S relationship types
else if arguments in any slot $> 1$ then
$R M T \leftarrow$ MC-S
end if
end for
if none matched then
$R M T \leftarrow MT$
end if

Determinations establish whether relationships are binary or ternary and clarify their descriptive forms. IDIM-T then processes these identifications. After determining all arguments, the system advances to the next module for generating structured representations, finalizing the relationship extraction results for the current keyword

k w

. Extracted entities and relationship triples are added to the respective result sets

E_{T e x t}

and

R_{T e x t}

.

For cases of

R M T

in MC-D or MT, the keyword

k w

is removed from

K W_{c}

. The clause c is segmented from

k w

to the end, forming a new clause

c^{'}

for subsequent processing. This loops until all keywords are processed, completing the extraction for the clause c. Merged results are then prepared for the next iteration of clause processing until all are handled.

5.1.4. Overall Processing Flow of IDIM-T

Algorithm 2 outlines the IDIM-T framework for processing cleaned text. Each sentence is divided into clauses using commas or periods, and each clause is then processed iteratively to extract entities and relationships.

The process starts by identifying any relationships within a clause. If found, IDIM-T extracts these relationships; otherwise, it searches for standalone entities. If no entities are identified, it moves to the next clause.

When relationships are mentioned, IDIM-T examines each keyword within the clause, interacting with SGKO to dynamically generate matching templates and search for relevant relationship arguments. This step involves clarifying the Relationship Mention Type (RMT) and progressively identifying the relationship’s type and description. Once all arguments are determined, IDIM-T constructs relationship triplets according to a predefined structure and completes the extraction for that keyword. If additional keywords exist, the process repeats until all are addressed; otherwise, it progresses to the next clause.

Upon completion, IDIM-T produces two result sets:

R_{T e x t}

for relationship triplets and

E_{T e x t}

for entities, encapsulating the text-level ERE results.

Algorithm 2 IDIM-T Processing Framework

Input: Cleaned problem text sentences S
Output: Relationship triplet set $R_{T e x t}$ and entity set $E_{T e x t}$
Split S into an ordered collection of clauses C by commas or periods
for each clause $c \in C$ do
Detect the existence of relationships in c.
if a relationship exists then
Perform relationship extraction.
else
Check for the presence of entities in c.
if entities exist then
Extract the entities.
end if
end if
for each keyword $k w$ in c do
Interact with SGKO to dynamically generate relationship matching templates.
Identify RMT and the type of relationship description.
Find all arguments involved in the relationship.
Generate relationship triplets in a predefined structured representation format, obtaining relationship triplet to $R_{c}$ and entity to $E_{c}$ .
end for
$R_{T e x t} \leftarrow R_{c}$
$E_{T e x t} \leftarrow E_{c}$
end for
return $R_{T e x t}$ , $E_{T e x t}$

5.2. Knowledge-Level ERE and IDIM-K

IDIM-K focuses on extracting knowledge-level entities and relationships using outputs

R_{T e x t}

and

E_{T e x t}

from IDIM-T. It operates through two main modules: the Entity Decomposition and Relationship Construction Module and the Entity Deduplication Module.

The Entity Decomposition and Relationship Construction Module processes Geometry Shape and Expression entities from

E_{T e x t}

. For Geometry Shapes, it breaks down entities dimensionally into Point entities, establishing construction-type relationships (edgeOf, vertex). Expression entities are decomposed using operators, forming corresponding relationships. Decomposed entities are marked to prevent confusion with the original ones mentioned in the problem text.

The Entity Deduplication Module resolves duplicate entities by retaining only unique representations, prioritizing entities mentioned first in the text or more explicitly in decompositions. It ensures all relationships involving removed duplicates are correctly updated.

Ultimately, IDIM-K generates

E_{K n o w l e d g e}

and

R_{K n o w l e d g e}

through these processes, merging and deduplicating them with

R_{T e x t}

and

E_{T e x t}

to produce the final sets E and R. At this point, the extraction of both GPU text-level and knowledge-level information is complete. With the two sets E and R, a problem semantic knowledge graph can be constructed.

5.3. Method Complexity Analysis

Template Matching Complexity. Traditional template-based methods require designing separate templates for each linguistic variant of a relationship, leading to exponential growth in template quantity as relational complexity increases. If a relationship has V linguistic variants and involves k concurrent relationships, the total number of templates T becomes

O (R \cdot V^{k})

(R: relationship types). This exponential scaling makes template matching prohibitively expensive, as it requires traversing all templates, resulting in exponential time complexity

O (R \cdot V^{k})

.

In contrast, the proposed dynamic template approach addresses this issue through two key improvements: tree-structure knowledge management, allowing that during matching, only the subtree of the current relational branch is traversed, avoiding full-template searches, and the reusable component-based template framework RSF, where only a total of eight RSFs need to be predefined and templates are dynamically generated instead of being independently designed. With dynamic pruning (skipping failed templates) and priority sorting (high-frequency templates prioritized), template matching complexity is reduced from the traditional exponential to a linear

O (R)

.

Text-level ERE Complexity. The text-level ERE process (IDIM-T) involves sentence splitting, keyword detection, template matching, and structured generation. The input problem text is split into clauses, with linear complexity

O (N)

(N: number of clauses). For each clause, scan keywords using a hash-indexed dictionary (key: keyword, value: relationship type), reducing complexity from

O (N \cdot K \cdot R)

(K: the average number of keywords per clause) to

O (N \cdot K)

. For each detected keyword, load its corresponding RSF from SGKO, which incurs a complexity of

O (N \cdot K \cdot C)

(C: the fixed number of RSFs,

C \leq

8), ultimately remaining at

O (N \cdot K)

. Finally, generating relationship triples incurs a complexity of

O (E)

(E: total entities). Thus, the total time complexity for this stage reaches linear

O (N \cdot K)

.

Knowledge-level ERE Complexity. The text-level ERE process (IDIM-K) includes entity decomposition, relationship construction and entity deduplication. Breaking down non-atomic entities into lower dimensions has a complexity of

O (M)

(M: total entities, including both original and dimension-decomposed entities), and the same applies to relationship construction. Then, hash-based deduplication introduces a complexity of

O (M \cdot l o g M)

. Consequently, the total time complexity for this stage reaches a linearithmic level of

O (M \cdot l o g M)

.

Overall Pipeline Complexity. Combining the text-level and knowledge-level processes, the total end-to-end pipeline complexity is

O (N \cdot K + M \cdot log M)

. This represents a significant improvement from the original exponential or polynomial complexities to a linearithmic level (even the worst-case complexity).

The proposed method achieves a breakthrough in complexity by reducing the exponential complexity of traditional template matching to linear complexity through tree-structured knowledge management and dynamic, component-based template generation. This optimization not only improves the efficiency of the text-level ERE process but also ensures that the overall pipeline remains scalable and practical, even for complex geometry problem understanding tasks. More detailed comparisons are summarized in Table 7.

6. Experiment and Application

6.1. Experiment and Analysis

To validate the proposed method, this paper conducted two types of experiments: self-verification tests and application verification tests. The self-verification tests evaluated the effectiveness and performance of the method on the dataset used during its development. The application verification tests, on the other hand, assessed the method on an unseen dataset, compared it with other research methods, and included an error analysis.

Given the GPU task’s specific requirements, successful problem understanding is defined as the 100% accurate extraction of all entities and relationships from the problem, ensuring completeness and correctness without any omissions or redundancies.

6.1.1. Self-Verification Test and Validity Verification

Dataset (A): Aiming at the semantic understanding of geometry proof problems, this paper collects 130 plane geometry proof problem texts without circles from the latest years’ math entrance examination questions, mainstream textbooks (People’s Education Press), textbook companions, exercise books, and mock exams. These problems include complex relationship descriptions and form the foundation for method construction. The dataset was statistically analyzed from two perspectives: relationship distribution and RMTs. Detailed statistics are provided in Table 8 and Table 9.

Experimental Setup: A system implementing the proposed method was built using Python 3.8, and the final problem semantic KG was constructed using Neo4j 4.4.16. All problems in the dataset were input into the system for comprehension. Manual verification was performed on intermediate outputs, entity sets and relation triplets, and final semantic KGs to validate text-level ERE and knowledge-level ERE.

Performance: The system successfully parsed all 130 geometry problems in the dataset, with entities and relationships accurately identified and extracted at both the text and knowledge levels. Notably, the relationship extraction at the text level achieved satisfactory performance: First was the accurate recognition of similar texts. For instance, the system identified “AB’s extension” as an entity mention, “AB’s extension BE” as a relationship description, “AB’s extension intersects CD” as mentioning one relationship, and “AB’s extension BE intersects AC” as containing two relationship descriptions. Second was the robust handling of complex relationships that span across sentences or share semantic roles, such as “the bisector AG of ∠DAE intersects CD at point G, (and) intersects the extension line of BC at point F” and “through the midpoint O of AC, draw EF ⊥ AC (and) intersects BC at point E, intersecting AD at point F”. Third, the method exhibits fault tolerance when the key information (keywords and entity names) is correct. For example, it can correctly recognize the relationship mentioned in “CF ‘were’ perpendicular to AC” because only the core atomic unit “perpendicular” is set as the keyword.

6.1.2. Application Verification Test and Error Analysis

Dataset (B): This dataset includes problems with complex relationship descriptions and contains entity types, relationship types, and knowledge attributes not covered during method construction (e.g., unhandled entity/relationship types, unconfigured entity descriptions, or missing RSF configurations for certain relationship types). Statistical details on problem characteristics and RMT distribution are provided in Table 10 and Table 11. The higher proportion of SI and MC series RMTs indicates that Dataset B contains more problems with complex linguistic expressions.

Experimental Setup: The geometry problems from Dataset B were input into the system to test the performance of entity and relationship extraction. (1) Problem Understanding Evaluation: Assesses the method’s performance in the actual application of problem understanding. (2) Text-Level ERE Evaluation and Analysis: Focuses on entity and relationship extraction at the text level, with comparisons to baseline methods and failure case analysis.

(1): Problem Understanding Evaluation

After testing a total of 230 problems, 209 geometry problems were successfully understood, resulting in a success rate of 91.87%. From the perspective of relationships, all processed entity and relationship types were correctly identified, even in failed problems. The effectiveness of problem understanding met expectations, thereby validating the performance of the method.

(2): Text-Level ERE Evaluation and Analysis

Baseline Methods: Four methods were compared, all built/trained on Dataset A and tested on Dataset B:

STM: A pure template-based method using 177 predefined sentence-level templates for entity–relation extraction (Representative method [50,51]).
TBC: A method that uses Bi-LSTM-CRF for entity extraction and sentence template-based relationship extraction (representative method [36]). It built a total of 153 templates for both atomic and composite relationship expressions across three levels—word, sentence, and clause—to normalize the problem text.
BBC: A machine-learning-based method that uses Bi-LSTM-CRF for entity extraction and Bert-CasRel for relationship extraction (representative method [62]).
S2-Like: Integrates the POS tagging (applies ICTCLAS) and hybrid rul/template (S2 model and sentence template) approaches for relationship extraction (referring to method [52]). The sentence templates used for restoring complex relationships are identical to those used in STM for non-simple expressions, with a total of 116 templates.

Evaluation: Performance was assessed through manual statistics and by calculating precision, recall, and F1 scores. Suppose there are p geometry problems in the dataset with a total of m relationships mentioned; the problem understanding method extracts n relationships, among which k are correct. The metrics are defined as follows: (1) Precision (P) = k/m, (2) Recall (R) = k/n, (3) F1 = 2k/(m + n).

Relationship extraction performance: In this dataset, there are a total of 1462 clauses and 1823 relationships, of which 1744 relationships were correctly extracted, resulting in a relationship extraction accuracy of 95.23%. All entities and relationships for the processed entity and relationship types were correctly extracted, successfully completing the text-level ERE task.

For text-oriented relationship extraction, the proposed method was compared with the baseline methods. The evaluation results of the four baseline methods alongside the proposed method are presented in Table 12.

Compared with the baseline methods, the proposed method achieved the highest precision (0.996) and F1 score (0.974) in relationship extraction, demonstrating excellent performance in accurately identifying relationships while balancing precision and recall. Notably, it showed advantages in handling problems with complex multi-relationship expressions, where baseline methods either encountered difficulties or required cumbersome procedures.

In the baseline methods, STM relies on predefined sentence-level templates, which, while precise for simple relationships, suffer from limited coverage of complex relationship combinations and unseen types. Moreover, redundant template designs (including unnecessary matching information) further reduce fault tolerance and generalization ability, thereby restricting recall. In multi-relationship scenarios, templates of varying complexity may produce redundant results, necessitating additional deduplication processes. Inadequate post-processing can severely impact accuracy, resulting in overall lower performance.

In the baseline methods, TBC mitigates template redundancy to some extent through expression normalization and the Bi-LSTM-CRF model, partially improving generalization. However, since its relationship extraction is still fundamentally based on sentence templates, it struggles with complex RMTs (especially MC-D and MT types). Additionally, the normalization process for sentences containing new types is prone to information loss, and the imbalance in training data makes the model sensitive to low-frequency relationships. Although TBC performs better than STM in terms of precision and recall, it still does not overcome the inherent limitations of template-based methods.

In the baseline methods, BBC’s end-to-end extraction avoids the limitations of static template design but is highly sensitive to data distribution. The model produces false negatives due to distribution shift (e.g., failing to detect perpendicular relationships), which severely affects recall. Furthermore, its ability to handle complex multi-step reasoning is limited, leading to possible errors in splitting relationship chains (such as relationships among expressions), and unhandled or insufficiently trained types result in recognition failures. Its performance is constrained by data quality and the coverage of annotations.

In the baseline methods, S2-Like enhances flexibility through POS tagging and fine-grained matching, but its reliance on lexical cues results in significant shortcomings in recognizing numerical and expression entities (e.g., in “AM = 1/2(AB + BC)”), adversely affecting the processing of related relationships. Its fundamental dependency on sentence templates for splitting complex expressions still limits recall. Moreover, it fails to effectively handle relationship directionality and nested expressions (e.g., misinterpreting “the perpendicular bisector of AE with respect to BC” as “AE perpendicular bisects BC”), which severely impacts precision. This hybrid method fails to balance flexibility and accuracy.

In summary, the proposed method—through dynamic template construction, knowledge-guided extraction, and direction-aware semantic parsing—significantly outperforms existing baseline methods, providing a more robust and efficient solution for the extraction of complex geometry relationships.

Error Analysis: A further detailed analysis was conducted on the problems with text-level ERE failures. The errors were attributed to the following four factors: (1) unprocessed entity types, (2) processed entity types without set knowledge attributes, (3) unprocessed relationship types, and (4) processed relationship types without set knowledge attributes. Table 13 and Table 14 present the extraction statistics of the proposed method from the perspectives of target relationships and RMTs.

Based on further statistical analysis of the results, the shortcomings in handling unary relationships stem from unprocessed entity types, primarily due to extended datasets containing more diverse geometry shape types, while numerical relationships involve more specialized mathematical symbols (e.g., radicals) or complex numerical values and expressions. However, these two relationship types predominantly appear in simpler RMTs (e.g., SC-B or SC-T), meaning their improvement can be achieved through broader data exposure and enhancements in SGKO. In contrast, binary and ternary relationships, as the most frequent types in geometry problems, are heavily associated with complex RMTs. In particular, ternary relationships—frequently appearing in intricate forms (e.g., SI-T, MC-series, and MT) and often mentioned in concurrent descriptions—pose the greatest challenges for GPU. Analysis reveals that errors in binary relationships mainly arise from unprocessed entity types and unprocessed relationship types, while errors in ternary relationships are primarily caused by shared unprocessed entities in concurrent RMTs. In other words, the obstacles for binary relationships largely lie in the completeness of knowledge, whereas the difficulties for ternary relationships are more tied to their expression patterns. Further, the disparities in error distribution across the tables and the above analysis demonstrate that failures in entity and relationship extraction are not random but systematically linked to specific relationship types and the structural complexity of the text.

For the four failure causes, a further examination of their detailed handling in failure cases is presented in the Table 15.

Template construction is designed at the atomic relationship level and the processing unit for entity relationship extraction so that error propagation is confined within a single atomic relationship. Even if a particular relationship type is undefined or incompletely configured, it only affects the directly related text fragment without interfering with the parsing of other atomic relationships, for example, the sentence excerpt from a problem text: “through ΔABH’s circumcenter E draw EF‖GC”.

Although the Circumcenter relationship is not processed, the handled PointOnLine relationship (E-pointOnLine-EF) and the Parallel relationship (EF-parallel-GC) can still be correctly identified and extracted. Furthermore, even in more complex cases such as “Trought ΔABH’s circumcenter E draw AD’s perpendicular bisector line EF‖GC intersects PM at point Q”, the unprocessed circumcenter relationship (E-circumcenter-ΔABH) does not affect the extraction of the processed relationships: E-pointOnLine-AD, EF-perpendicularBisector-AD, EF-parallel-GC, and EF-intersect-PM.

Further, to verify the scope of error impact, the following experiment was conducted.

Experimental Setup: From Dataset B, 209 problems that did not contain any extraction errors were selected. In the current SGKO, one relationship type was manually masked each time, and the accuracy of relationship extraction was tested to assess its impact on the extraction of other relationships. Three relationship types were chosen for masking, PerpendicularBisector and Parallel, representing the least and most frequently mentioned binary relationship types in the dataset, respectively, and Perpendicular, a kind of ternary relationship type whose mention may include a perpendicular foot, with the core relationship type Perpendicular marked and its attached relationship type foot left unmarked.

As shown in Table 16, among a total of 1705 relationships, the masked group with PerpendicularBisector, Parallel, and Perpendicular, respectively, achieved extraction accuracies of 99.06%, 95.78%, and 92.79%, with all errors confined to the masked relationship type.

This indicates two conclusions: First, error propagation was effectively limited to the atomic relationship. Second, the impact of failures caused by unprocessed relationship types was minimal, correlating only with the frequency of relationship mentions. Additionally, tests on 123 Perpendicular relationships showed that attached relationships were not extracted when their relationship type was not marked. Their correct processing depended on the extraction of their relative core relationships.

Consequently, based on an atomic-level design for relationship extraction, the error propagation scope shifts from the sentence level (where an error affects the entire clause) to the atomic unit level (where an error only affects a single relationship mention). Consequently, errors change from being positively correlated with the complexity of the problem’s logic (e.g., the number of nested relationships) to being positively correlated with the complexity of the atomic relationship expressions (e.g., whether they span across clauses).

Above all, based on the statistics and analysis of the processing results, the following conclusions can be drawn: First, for the processed relationship types and entity types, the model consistently achieves high accuracy and efficiency in entity relationship extraction. This validates the effectiveness of the dynamic componentized template and the entity relationship extraction process. Second, the impact of errors caused by unprocessed entity/relationship types is limited to atomic relationship units rather than affecting entire clauses or sentences as before, demonstrating that the template design approach based on atomic relationships is both reasonable and effective. Third, the templates based on atomic relationship units are further modularized; the matching units are decoupled into the smallest granular arguments, and, guided by knowledge, more flexible templates are realized. This approach effectively addresses relationship mentions in sentences with complex structures, solving issues that traditional rules/templates have struggled with. Fourth, the model exhibits robustness, fault tolerance, and scalability. In the experiments, even when some descriptions contained spelling errors (unrelated to the extraction elements), the model was still able to successfully extract the relationships, demonstrating its robustness; errors in entity relationship extraction were confined to the atomic relationship unit and do not affect correctly processed relationships, ensuring fault tolerance; and the plug-and-play, zero-cost expansion of target elements based on tree-structured knowledge subtrees greatly facilitated model updates.

6.2. Knowledge Graph Representation of Geometry Problem Semantic

After completing the dual-level ERE for the geometry problem text, the corresponding knowledge graph is constructed. During this process, some relationships are converted into attributes of nodes or edges within the graph. For node attribute conversion, relationships of the Shapelimits type will have their ShapeLimit entities transformed into attributes of Polygon or Triangle entity nodes. For example, in the relationship ((“Isosceles”, ShapeLimit), shapelimits, (“ΔABC”, Triangle)), the (“Isosceles”, ShapeLimit) will be transformed into an attribute of the (“ΔABC”, Triangle) entity node. For edge attribute conversion, relationships of the Foot and Intersection types will be converted into attributes of edges of the Perpendicular and Intersects types, respectively.

Taking the geometry problem of Figure 9 as an example, its final knowledge graph representation of the problem semantics is shown in Figure 10.

7. Conclusions and Future Work

GPU is a fundamental task in the intelligent processing of geometry. To achieve high-quality problem understanding, this paper first identifies the current deficiencies in the GPU field, analyzes the technical approaches and their limitations, and points out the issues present in existing work. In response, a knowledge and semantic fusion GPU method is proposed to address these problems. In this method, a knowledge ontology model, SGKO, is first constructed to build a knowledge base for guiding the information extraction process. For text-level information, a state machine model based on template matching is employed to extract entities and relationships directly mentioned in the problem text, while knowledge-level information is extracted using rule-based methods. During information extraction, knowledge is dynamically queried from SGKO as needed.

This paper improves both the matching templates and the knowledge graph by modularizing the templates and treating knowledge as the matching object, as well as decoupling the ontology layer from the data layer in the knowledge graph. These optimizations render the methods and tools more efficient and flexible. The main innovation of the approach lies in the modular and component-based design of knowledge and templates, which, managed and maintained through a tree structure, realizes on-demand knowledge base queries and dynamic template generation and matching for the extraction of entities and relationships from problem texts. This not only guarantees high accuracy but also, on the one hand, significantly reduces the number of templates and improves method efficiency, reducing the time and space complexity of traditional rule/template-based methods from exponential to linear (text-level) or linearithmic (knowledge-level); on the other hand, it provides excellent scalability, supporting a plug-and-play expansion without the need to reconstruct existing templates or computational frameworks. Finally, the experimental results validate the effectiveness and excellent performance of the method, successfully extracting all two-level entities and relationships completely, correctly, and accurately from 91.87% of the 230 experimental problems containing complex relationship descriptions.

The failures in entity and relationship extraction are mainly due to unhandled entity/relationship types and missing knowledge attributes for processed types. In the process of entity and relationship extraction, correctly identifying the entities associated with relationships and parsing the sentence structures that express these relationships are two indispensable factors for successful extraction; missing either can result in cases where relationships are recognized from complex text structures but subsequent failures occur due to unprocessed shared undefined entities. In other words, although the proposed method overcomes challenges arising from diverse expressions and complex sentence structures, its performance largely depends on the completeness of the SGKO knowledge base. Even though its excellent scalability alleviates the development burden to a certain extent, developers still need to invest significant effort in corpus research.

Several aspects of the methods presented in this paper can be further researched and optimized: (1) further improving SGKO on the current approach by expanding support for more types of entities and relationships and enriching their corresponding knowledge, thus building a more robust knowledge base; (2) considering that the information provided by the text in geometry problems is limited and some relationships are difficult to fully capture solely through text, integrating image processing with this work is a promising direction for future research; (3) exploring the integration of this work with downstream tasks (such as automated reasoning, problem solving, and diagram drawing) to further enhance the intelligent processing level of geometry problems.

Author Contributions

Conceptualization, Y.R. and H.G.; methodology, Y.W. and W.Z.; software, W.Z.; validation, Y.W., Y.R. and H.G.; writing—original draft preparation, Y.W. and W.Z.; writing—review and editing, Y.R. and H.G. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (No. 62407011) and the Guangzhou Academician and Expert Workstation (No. 2024-D003).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Huang, L.; Yu, X.; He, B. A Novel Geometry Problem Understanding Method Based on Uniform Vectorized Syntax-Semantics Model. In Proceedings of the 2022 International Conference on Intelligent Education and Intelligent Research (IEIR), Wuhan, China, 18–20 December 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 78–85. [Google Scholar]
Seo, M.; Hajishirzi, H.; Farhadi, A.; Etzioni, O.; Malcolm, C. Solving Geometry Problems: Combining Text and Diagram Interpretation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 17–21 September 2015; Association for Computational Linguistics: Stroudsburg, PA, USA, 2015; pp. 1466–1476. [Google Scholar]
Mukherjee, A.; Garain, U. A Review of Methods for Automatic Understanding of Natural Language Mathematical Problems. Artif. Intell. Rev. 2008, 29, 93–122. [Google Scholar]
Wong, W.K.; Hsu, S.C.; Wu, S.H.; Lee, C.W.; Hsu, W.L. LIM-G: Learner-Initiating Instruction Model Based on Cognitive Knowledge for Geometry Word Problem Comprehension. Comput. Educ. 2007, 48, 582–601. [Google Scholar]
Zheng, X.; Wang, B.; Zhao, Y.; Mao, S.; Tang, Y. A Knowledge Graph Method for Hazardous Chemical Management: Ontology Design and Entity Identification. Neurocomputing 2021, 430, 104–111. [Google Scholar]
Qiu, Q.; Xie, Z.; Zhang, D.; Ma, K.; Tao, L.; Tan, Y.; Jiang, B. Knowledge Graph for Identifying Geological Disasters by Integrating Computer Vision with Ontology. J. Earth Sci. 2023, 34, 1418–1432. [Google Scholar] [CrossRef]
He, Y.; Hao, C.; Wang, Y.; Li, Y.; Wang, Y.; Huang, L.; Tian, X. An Ontology-Based Method of Knowledge Modelling for Remanufacturing Process Planning. J. Clean. Prod. 2020, 258, 120952. [Google Scholar]
Chang, D.S.; Cho, G.H.; Choi, Y.S. Ontology-Based Knowledge Model for Human-Robot Interactive Services. In Proceedings of the 35th Annual ACM Symposium on Applied Computing, Brno, Czech Republic, 30 March–3 April 2020; ACM: New York, NY, USA, 2020; pp. 2029–2038. [Google Scholar]
Wilcke, X.; Bloem, P.; De Boer, V. The Knowledge Graph as the Default Data Model for Learning on Heterogeneous Knowledge. Data Sci. 2017, 1, 39–57. [Google Scholar]
Feng, Z.; Mayer, W.; He, K.; Kwashie, S.; Stumptner, M.; Grossmann, G.; Huang, W. A Schema-Driven Synthetic Knowledge Graph Generation Approach With Extended Graph Differential Dependencies (GDD^xs). IEEE Access 2020, 9, 5609–5639. [Google Scholar]
Huang, L.; Zhao, Y.; Wang, B.; Zhang, D.; Zhang, R.; Das, S.; Giunchiglia, F. Property-Based Semantic Similarity Criteria to Evaluate the Overlaps of Schemas. Algorithms 2021, 14, 241. [Google Scholar] [CrossRef]
Sharma, C.; Sinha, R. A Schema-First Formalism for Labeled Property Graph Databases: Enabling Structured Data Loading and Analytics. In Proceedings of the 6th IEEE/ACM International Conference on Big Data Computing, Applications and Technologies (BDCAT), Auckland, New Zealand, 2–5 December 2019; IEEE/ACM: New York, NY, USA, 2019; pp. 71–80. [Google Scholar]
Gene Ontology Consortium. The Gene Ontology Resource: 20 Years and Still GOing Strong. Nucleic Acids Res. 2019, 47, D330–D338. [Google Scholar]
Bodenreider, O. The Unified Medical Language System (UMLS): Integrating Biomedical Terminology. Nucleic Acids Res. 2004, 32, D267–D270. [Google Scholar]
The Ontology of GeoNames Data Set. Available online: https://download.geonames.org/export/dump/ (accessed on 1 January 2024).
Portele, C. OpenGIS® Geography Markup Language (GML) Encoding Standard; Version 3.2.1; Open Geospatial Consortium: Wayland, MA, USA, 2007; Available online: https://www.ogc.org/standard/gml/ (accessed on 1 January 2024).
CIDOC CRM. Conceptual Reference Model. Available online: https://cidoc-crm.org (accessed on 1 January 2024).
Ho, Q.T.; Le, N.Q.K.; Ou, Y.Y. FAD-BERT: Improved Prediction of FAD Binding Sites Using Pre-Training of Deep Bidirectional Transformers. Comput. Biol. Med. 2021, 131, 104258. [Google Scholar] [CrossRef]
Vaswani, A. Attention Is All You Need. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS 2017), Long Beach, CA, USA, 4–9 December 2017; Curran Associates Inc.: Red Hook, NY, USA, 2017; pp. 5998–6008. [Google Scholar]
Trinh, T.H.; Wu, Y.; Le, Q.V. Solving Olympiad Geometry Without Human Demonstrations. Nature 2024, 625, 476–482. [Google Scholar] [CrossRef] [PubMed]
Chervonyi, Y.; Trinh, T.H.; Olšák, M. Gold-Medalist Performance in Solving Olympiad Geometry with AlphaGeometry2. arXiv 2025, arXiv:2502.03544. [Google Scholar]
Team Gemini. Gemini 1.5: Unlocking Multimodal Understanding Across Millions of Tokens of Context. arXiv 2024, arXiv:2403.05530.
Zhu, N.; Zhang, X.; Huang, Q. FGeo-Parser: Autoformalization and Solution of Plane Geometric Problems. Symmetry 2024, 17, 8. [Google Scholar] [CrossRef]
Raffel, C.; Shazeer, N.; Roberts, A. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. J. Mach. Learn. Res. 2020, 21, 5485–5551. [Google Scholar]
Chen, J.; Li, T.; Qin, J. UniGeo: Unifying Geometry Logical Reasoning via Reformulating Mathematical Expression. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP), Abu Dhabi, United Arab Emirates, 7–11 December 2022; Association for Computational Linguistics: Stroudsburg, PA, USA, 2022; pp. 3313–3323. [Google Scholar]
Cho, J.; Lei, J.; Tan, H. Unifying Vision-and-Language Tasks via Text Generation. In Proceedings of the 38th International Conference on Machine Learning (ICML 2021), Virtual Event, 18–24 July 2021; PMLR: Cambridge, MA, USA, 2021; pp. 1931–1942. [Google Scholar]
Zhang, M.L.; Yin, F.; Liu, C.L. A Multi-Modal Neural Geometric Solver with Textual Clauses Parsed from Diagram. In Proceedings of the 32nd International Joint Conference on Artificial Intelligence (IJCAI 2023), Macao, China, 19–25 August 2023; AAAI Press: Palo Alto, CA, USA, 2023; pp. 3374–3382. [Google Scholar]
Jian, P.; Guo, F.; Wang, Y. Solving Geometry Problems via Feature Learning and Contrastive Learning of Multimodal Data. CMES-Comput. Model. Eng. Sci. 2023, 136, 425–441. [Google Scholar]
He, Z.; Zhong, X. A Precise Text-to-Diagram Generation Method for Elementary Geometry. In Proceedings of the 2023 20th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP), Chengdu, China, 15–17 December 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1–7. [Google Scholar]
Wang, J.; Zhang, T.; Yu, H. MagicGeo: Training-Free Text-Guided Geometric Diagram Generation. arXiv 2025, arXiv:2502.13855. [Google Scholar]
Zhang, C.; Song, J.; Li, S. Proposing and Solving Olympiad Geometry with Guided Tree Search. arXiv 2024, arXiv:2412.10673. [Google Scholar]
Chen, J.; Tang, J.; Qin, J. GeoQA: A Geometric Question Answering Benchmark Towards Multimodal Numerical Reasoning. In Findings of the Association for Computational Linguistics, Proceedings of the ACL-IJCNLP 2021, Bangkok, Thailand, 1–6 August 2021; Association for Computational Linguistics: Stroudsburg, PA, USA, 2021; pp. 513–523. [Google Scholar]
Gers, F.A.; Schmidhuber, J.; Cummins, F. Learning to Forget: Continual Prediction with LSTM. Neural Comput. 2000, 12, 2451–2471. [Google Scholar] [CrossRef]
Cao, J.; Xiao, J. An Augmented Benchmark Dataset for Geometric Question Answering Through Dual Parallel Text Encoding. In Proceedings of the 29th International Conference on Computational Linguistics, Gyeongju, Republic of Korea, 12–17 October 2022; International Committee on Computational Linguistics: Stroudsburg, PA, USA, 2022; pp. 1511–1520. [Google Scholar]
Liu, Y.; Ott, M.; Goyal, N. Roberta: A Robustly Optimized BERT Pretraining Approach. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL 2020), Online, 5–10 July 2020; Association for Computational Linguistics: Stroudsburg, PA, USA, 2020; pp. 6457–6468. [Google Scholar]
Zhou, W.; Xu, R.; Guan, H. Research on Geometry Problem Text Understanding Based on Bidirectional LSTM-CRF. In Proceedings of the 2022 9th International Conference on Digital Home (ICDH), Guangzhou, China, 2–4 December 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 121–127. [Google Scholar]
Huang, Z.; Xu, W.; Yu, K. Bidirectional LSTM-CRF Models for Sequence Tagging. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP 2015), Lisbon, Portugal, 17–21 September 2015; Association for Computational Linguistics: Stroudsburg, PA, USA, 2015; pp. 1464–1474. [Google Scholar]
Tsai, S.; Liang, C.C.; Wang, H.M. Sequence to General Tree: Knowledge-Guided Geometry Word Problem Solving. arXiv 2021, arXiv:2106.00990. [Google Scholar]
Chung, J.; Gulcehre, C.; Cho, K. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. In Proceedings of the NIPS 2014 Workshop on Deep Learning, Montreal, QC, Canada, 12 December 2014; Neural Information Processing Systems Foundation: San Diego, CA, USA, 2014. [Google Scholar]
Iordan, A.E. Usage of Stacked Long Short-Term Memory for Recognition of 3D Analytic Geometry Elements. In Proceedings of the 14th International Conference on Agents and Artificial Intelligence (ICAART 2022), Online, 3–5 February 2022; SCITEPRESS: Setúbal, Portugal, 2022; Volume 3, pp. 745–752. [Google Scholar]
Sirisha, B.; Goud, K.K.C.; Rohit, B.T.V.S. A Deep Stacked Bidirectional LSTM (SBiLSTM) Model for Petroleum Production Forecasting. Procedia Comput. Sci. 2023, 218, 2767–2775. [Google Scholar] [CrossRef]
Xiao, T.; Liu, J.; Huang, Z. Learning to Solve Geometry Problems via Simulating Human Dual-Reasoning Process. In Proceedings of the 33rd International Joint Conference on Artificial Intelligence (IJCAI 2024), Jeju Island, Republic of Korea, 3–9 August 2024; AAAI Press: Palo Alto, CA, USA, 2024; pp. 6559–6568. [Google Scholar]
Gan, W.; Yu, X.; Wang, M. Automatic Understanding and Formalization of Plane Geometry Proving Problems in Natural Language: A Supervised Approach. Int. J. Artif. Intell. Tools 2019, 28, 1940003. [Google Scholar] [CrossRef]
Yu, W.; Wang, M.; Wang, X. GEORE: A Relation Extraction Dataset for Chinese Geometry Problems. In Proceedings of the 35th Conference on Neural Information Processing Systems (NeurIPS 2021) Workshop on Math AI for Education (MATHAI4ED), Virtual Event, 6–14 December 2021; Curran Associates Inc.: Red Hook, NY, USA, 2021. [Google Scholar]
Xing, S.; Xiang, C.; Han, Y. GePBench: Evaluating Fundamental Geometric Perception for Multimodal Large Language Models. arXiv 2024, arXiv:2412.21036. [Google Scholar]
Lu, P.; Gong, R.; Jiang, S. Inter-GPS: Interpretable Geometry Problem Solving with Formal Language and Symbolic Reasoning. arXiv 2021, arXiv:2105.04165. [Google Scholar]
Peng, S.; Fu, D.; Liang, Y. Geodrl: A Self-Learning Framework for Geometry Problem Solving Using Reinforcement Learning in Deductive Reasoning. In Findings of the Association for Computational Linguistics, Proceedings of the ACL 2023, Toronto, Canada, 9–14 July 2023; Association for Computational Linguistics: Stroudsburg, PA, USA, 2023; pp. 13468–13480. [Google Scholar]
Mukherjee, A.; Garain, U.; Nasipuri, M. On Construction of a GeometryNet. In Proceedings of the 25th IASTED International Multi-Conference on Artificial Intelligence and Applications, Innsbruck, Austria, 13–15 February 2007; ACTA Press: Anaheim, CA, USA, 2007; pp. 530–536. [Google Scholar]
Liu, Q.T.; Huang, H.; Wu, L.J. Using Restricted Natural Language for Geometric Construction. Appl. Mech. Mater. 2012, 145, 465–469. [Google Scholar] [CrossRef]
Wong, W.K.; Yin, S.K.; Yang, C.Z. Drawing Dynamic Geometry Figures Online with Natural Language for Junior High School Geometry. Int. Rev. Res. Open Distrib. Learn. 2012, 13, 126–147. [Google Scholar] [CrossRef]
Guo, H.; Liu, Q.; Chen, M.; Huang, H.; Ge, Q. Research for Facing the Natural Language of the Geometry Drawing. Comput. Sci. 2012, 39, 503–506. [Google Scholar]
Gan, W.; Yu, X. Automatic Understanding and Formalization of Natural Language Geometry Problems Using Syntax-Semantics Models. Int. J. Innov. Comput. Inf. Control 2018, 14, 83–98. [Google Scholar]
Gan, W.; Yu, X.; Sun, C. Understanding Plane Geometry Problems by Integrating Relations Extracted from Text and Diagram. In Image and Video Technology, Proceedings of the 8th Pacific-Rim Symposium, PSIVT 2017, Wuhan, China, 20–24 November 2017; Springer International Publishing: Cham, Switzerland, 2018; pp. 366–381. [Google Scholar]
Gan, W.; Yu, X.; Zhang, T. Automatically Proving Plane Geometry Theorems Stated by Text and Diagram. Int. J. Pattern Recognit. Artif. Intell. 2019, 33, 1940003. [Google Scholar] [CrossRef]
Yu, X.; Geng, Y.; Feng, Z. Solving Solid Geometric Calculation Problems in Text. In Proceedings of the 2021 IEEE International Conference on Engineering, Technology & Education (TALE), Wuhan, China, 5–8 December 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 525–530. [Google Scholar]
Jian, P.; Sun, C.; Yu, X. An End-to-End Algorithm for Solving Circuit Problems. Int. J. Pattern Recognit. Artif. Intell. 2019, 33, 1940004. [Google Scholar] [CrossRef]
Yu, X.; Gan, W.; Wang, M. Understanding Explicit Arithmetic Word Problems and Explicit Plane Geometry Problems Using Syntax-Semantics Models. In Proceedings of the 2017 International Conference on Asian Language Processing (IALP), Singapore, 5–7 December 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 247–251. [Google Scholar]
He, B.; Yu, X.; Jian, P. A Relation Based Algorithm for Solving Direct Current Circuit Problems. Appl. Intell. 2020, 50, 2293–2309. [Google Scholar] [CrossRef]
Yu, X.; Lyu, X.; Peng, R. Solving Arithmetic Word Problems by Synergizing Syntax-Semantics Extractor for Explicit Relations and Neural Network Miner for Implicit Relations. Complex Intell. Syst. 2023, 9, 697–717. [Google Scholar] [CrossRef]
Lyu, X.; Yu, X. Solving Explicit Arithmetic Word Problems via Using Vectorized Syntax-Semantics Model. In Proceedings of the 2021 IEEE International Conference on Engineering, Technology & Education (TALE), Wuhan, China, 5–8 December 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1–7. [Google Scholar]
Huang, L.; Yu, X.; Niu, L. Solving Algebraic Problems with Geometry Diagrams Using Syntax-Semantics Diagram Understanding. Comput. Mater. Contin. 2023, 77, 517–539. [Google Scholar] [CrossRef]
Wei, Z.; Su, J.; Wang, Y. A Novel Cascade Binary Tagging Framework for Relational Triple Extraction. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL 2020), Online, 5–10 July 2020; Association for Computational Linguistics: Stroudsburg, PA, USA, 2020; pp. 1476–1488. [Google Scholar]

Figure 1. Three cases of GPU missing knowledge-level relationships (with red arrows indicating knowledge-level relationships).

Figure 2. Overview of the model framework.

Figure 3. The upper layer knowledge system (with the purple parts representing semantic-enhanced adjustments).

Figure 4. Example of an RSF of a binary relationship type (The English translation is literal to highlight the nuances present in the Chinese text).

Figure 5. Example of an RSF of a ternary relationship type (the English translation is literal to highlight the nuances present in the Chinese text).

Figure 6. Complete knowledge storage for Perpendicular.

Figure 7. Multidimensional classification of RMTs.

Figure 8. Example of Perpendicular relationship matching template construction.

Figure 9. Example geometry problem (cleaned).

Figure 10. Problem semantic KG (of the problem in Figure 9).

Table 1. Free combination of polygon descriptions.

No.	Combination	Example
1	ShapeLimit–PolygonType–Polygon	Isosceles Trapezoid $A B C D$ , Right Trapezoid $A B C D$
2	ShapeLimit–Polygon	Equilateral Triangle $▵ A B C$ , Isosceles Triangle $▵ A B C$
3	PolygonType–Polygon	Parallelogram $E F G H$ , Trapezoid $A B C D$ , Rectangle $A B C D$
4	ShapeLimit–ShapeLimit–Polygon	Right Isosceles Triangle $▵ A B C$
5	Polygon	Triangle $▵ A B C$

Table 2. Examples of domain knowledge attributes.

Relationship Type	Knowledge Attribute
Relationship Type	Role of Argument	Attribute Value
Parallel	Subject	Line
Parallel	Object	Line
Height	Subject	Line
Height	Object	Line, Polygon
Perpendicular	Subject	Line
	Object	Line
	Additional Entity	Point

Table 3. Statistic on keywords and RSF of relationship types in SGKO.

Relationship Type	# Entity Type	# Keywords	# RSF
Point on line	2	5	4
Middle point	2	1	2
Middle Line	2	1	2
Height	2	1	2
Bisector	2	1	2
Intersects	2	2	2
Intersection *	2	2	2
Perpendicular	1	2	2
Foot *	2	2	2
Perpendicular bisector	1	-	-
ShapeLimits	2	-	1
PolygonType	3	-	2
Parallel	1	2	2
Equals	4	1	1
XX-Line of	3	7	2
Total	10 (31)	27	8 (28)
Avg	2.07	1.80	1.86

“#” indicates quantity. “*” markes an independent attached-relationship type, which is not considered as a separate type. “-” indicates (a) Perpendicular bisector combines existing relationship-types (Perpendicular and Bisector), and (b) ShapeLimits and Polygon type are triggered by entity types. There are total 10 kinds of entity types, with 31 total calls; the RSF library contains a total of 8 RSFs, with 28 total calls.

Table 4. Scalability comparison of adding new entity/relationship types across methods.

Method	Extension Effort	Potential Impact
Rule/Template-Based Methods	Design rules/templates for atomic and composite relationships	May introduce template conflicts
Machine Learning Methods	New corpus collection, model retraining/fine-tuning (, annotation) *	May impact accuracy of other entity/relationship extraction
Proposed Method	Configure three kinds of knowledge attributes	Zero interference

“*” indicates supervised learning and semi-supervised learning may require annotation, while unsupervised learning does not.

Table 5. Identification criteria for RMTs.

RMT	Identification Basis
SC-B	(1) c contains one $k w_{a u t o n o m o u s}$ and no $k w_{a t t a c h e d}$ . (2) For $k w_{a u t o n o m o u s} \in K W_{r t}$ , there is $r t \in R T_{B i n a r y}$ . (3) All argument entities (subject and object) are in c, and each argument slot mentions only one entity.
SC-T	(1) c contains one $k w_{a u t o n o m o u s}$ and one $k w_{a t t a c h e d}$ . (2) $k w_{a u t o n o m o u s}, k w_{a t t a c h e d} \in K W_{r t}$ , and $r t \in R T_{T e r n a r y}$ . (3) All argument entities, and additional entity are in c, and each argument slot mentions only one entity.
SI-B	(1) c contains $k w_{a u t o n o m o u s} \in K W_{r t}$ with $r t \in R T_{B i n a r y}$ . (2) There is a missing argument in c, usually the subject entity. (3) The missing argument entity is usually in a clause before the current c, and its entity type usually belongs to the polygon class.
SI-T	(1) The complete relationship description involves two consecutive clauses $c_{i}$ and $c_{i + 1}$ , i is the index of the clause in C. (2) In the case of Attached relationship separation: $r d_{c o r e}$ is in $c_{i}$ , $r d_{a t t a c h e d}$ is in $c_{i + 1}$ . $k w_{c o r e}$ is in c. $c_{i + 1}$ contains only one $k w_{a t t a c h e d}$ , which is the relationship description of $r_{a t t a c h e d}$ in that clause. There are only $k w_{a t t a c h e d}$ and additional entity in that clause. (3) In the case of Subject entity separation, $k w_{c o r e}$ is in $c_{i + 1}$ , subject entity is in $c_{i}$ .
MC-S	(1) c contains more than one $k w_{a u t o n o m o u s}$ . (2) Let $K W_{c}$ be the set of all autonomous keywords in c. For the relationships involving the current $k w \in K W_{c}$ , all its arguments are in c. (3) There is one or more entities in the same argument slot. Multiple entities are connected by conjunctions (such as “,” (ideographic comma), “and”, etc.). (4) The possible correspondences in the number of entity mentions between argument slots include three types: 1-to-1 (e.g., “AB intersects CE at point E”), 1-to-m (e.g., “AB respectively intersects CE, DF at points P, Q”, “AE, BF intersect CD at P, Q”), and m-to-m situations (e.g., “Points E, F respectively are the midpoints of AC, BC”). (5) For m-to-m, entities belonging to the same relationship have the same index in all argument slot mentions.
MC-D	(1) c contains more than one $k w_{a u t o n o m o u s}$ . (2) For the relationships involving the current $k w \in K W_{c}$ , all its arguments are in c. (3) c is linguistically termed as a multi-predicate sentence structure where all relationship descriptions share the same sentence subject.
MT	(1) c contains more than one $k w_{a u t o n o m o u s}$ . (2) c contains complete relationship descriptions and at least one incomplete relationship description. (3) In the cases of Attached Relationship Isolation and Subject Entity Isolation, the whole relationship description belongs to the current type.

Table 6. Example of Perpendicular relationship template.

Template Keyword	RSF	Template Formed After Knowledge Filling	Formal Representation
⊥	S-K-O	Line-⊥-Line	(subject, Line, left)
⊥	S-K-O	Line-⊥-Line	(object, Line, right)
perpendicular	S-K-O	Line-perpendicular-Line	(subject, Line, left)
	S-K-O	Line-perpendicular-Line	(object, Line, right)
	S-O-K	Line-Line-perpendicular line	(subject, Line, left)
	S-O-K	Line-Line-perpendicular line	(object, Line, left)
at	K-A	at-Point	(additional entity, Point, right)
foot	K-A	foot-Point	(additional entity, Point, right)
foot	A-K	Point-foot	(additional entity, Point, left)

Table 7. Summary of complexity.

Module	Traditional Method	Proposed Method	Optimization
Template Complexity	Time: (Exponential) $O (R \cdot V^{k})$	Time: (Linear) $O (R)$	Hash table indexing, Tree-structured SGKO, Limited RSFs.
Template Complexity	Space: (Exponential) $O (R \cdot V^{k})$	Space: (Linear) $O (R \cdot C + K)$	Hash table indexing, Tree-structured SGKO, Limited RSFs.
Text-Level ERE	Time: (Polynomial) $O (N \cdot K \cdot T)$	Time: (Linear) $O (N \cdot K + E)$	Hash table lookup ( $O (1)$ per keyword), Dynamic Template Matching.
Text-Level ERE	Space: (Exponential) $O (N \cdot L + R \cdot V^{k})$	Space: (Linear) $O (N \cdot L + E + R)$
Knowledge-Level ERE	Time: (Quadratic) $O (M^{2})$	Time: (Linearithmic) $O (M \cdot log M)$	Hash-based deduplication.
Knowledge-Level ERE	Space: (Quadratic) $O (M^{2})$	Space: (Linear) $O (M)$	Hash-based deduplication.
Full Pipeline	Time: (Exponential) $O (R \cdot V^{k} + M^{2})$	Time: (Linearithmic) $O (N \cdot K + M \cdot log M)$
Full Pipeline	Space: (Exponential) $O (R \cdot V^{k} + M^{2})$	Space: (Linear) $O (N \cdot L + R + M)$

Key Symbols: R: Relationship types, V: Variants per relationship, k: Concurrency level, N: Clauses, K: Keywords per clause, E: Entities, M: Entities including both original and decomposed entities, L: Clause length, C: Atomic components (constant).

Table 8. Statistics on problems information and relationship distribution of Dataset A.

Statistical Item	Total	Average	Proportion
Total Problems	130	-	-
Total Clauses	978	7.52	1
No Relationship Clauses	226	1.74	0.23
Relationship Clauses	752	5.78	0.76
Unary Geo.Relationship	125	0.96	0.12
Binary Geo.Relationship	604	4.65	0.58
Ternary Geo.Relationship	260	2.00	0.25
Numerical Relationship	52	0.40	0.05
Total Relationships	1041	8.01	1

Table 9. Statistics of the Relationship Mention Type distribution of Dataset A.

RMT	Total	Average	Proportion
SC-B	394	3.03	0.38
SC-T	268	2.06	0.26
SI-B	78	0.60	0.07
SI-T	102	0.78	0.10
MC-S	125	0.96	0.12
MC-D	52	0.40	0.05
MT	22	0.17	0.02
Total Relationships	1041	8.01	1

Table 10. Statistics on the problem information and relationship distribution of Dataset B.

Statistical Item	Total	Average	Proportion
Total Problems	230	-	-
Total Clauses	1769	7.69	1
No Relationship Clauses	307	1.33	0.17
Relationship Clauses	1462	6.36	0.83
Unary Geo.Relationship	182	0.79	0.10
Binary Geo.Relationship	1027	4.47	0.56
Ternary Geo.Relationship	534	2.32	0.29
Numerical Relationship	89	0.39	0.05
Total Relationships	1832	7.97	1

Table 11. Statistics on the Relationship Mention Type distribution of Dataset B.

RMT	Total	Average	Proportion
SC-B	458	1.99	0.25
SC-T	367	1.60	0.20
SI-B	183	0.80	0.10
SI-T	277	1.20	0.15
MC-S	329	1.43	0.18
MC-D	146	0.63	0.08
MT	72	0.31	0.04
Total Relationships	1832	7.97	1

Table 12. Performance comparison of relationship extraction between the proposed method and the baseline methods.

Method	Precision (P)	Recall (R)	F1 Score (F1)
STM	0.924	0.782	0.848
TBC	0.942	0.863	0.900
BBC	0.894	0.804	0.847
S2-like	0.877	0.729	0.796
Proposed Method	0.996	0.952	0.974

Table 13. Statistics on failure text-level ERE from the target relationship type dimension.

Relationship Type	Total	Accuracy	Failure Causes
Unary Geo.Relationship	182	97.80%	1, 3
Binary Geo.Relationship	1027	96.11%	1, 2, 3, 4
Ternary Geo.Relationship	534	93.26%	1
Numerical Relationship	89	91.01%	2
Total Relationships	1832	95.23%	-

Table 14. Statistics on failure text-level ERE from the RMT dimension.

RMT	Total	Accuracy	Failure Causes
SC-B	458	98.25%	1, 2, 3, 4
SC-T	367	96.46%	3
SI-B	183	94.54%	1, 2, 3, 4
SI-T	277	92.06%	1, 3
MC-S	329	90.58%	1, 3, 4
MC-D	146	87.67%	1, 3, 4
MT	72	83.33%	1, 2, 3, 4
Total Relationships	1832	95.23%	-

Table 15. Error causes and handling in entity and relationship extraction.

Failure Cause	Relationship Extraction Handling	Entity Extraction Handling
Unhandled relationship type	Unable to detect and extract relationships.	Can correctly extract all processed entities.
Unhandled entity type	Relationships can be detected, but the extraction fails during template matching due to the inability to find entities of the corresponding type.	Unable to detect and extract entities.
Knowledge attribute not set for relationship type	If the unset is a keyword, relationships cannot be detected. If the unset is RSF, there exists surprisingly successful extraction cases.	Unable to detect and extract entities.
Knowledge attribute not set for entity type	Can detect the existence of relationships, but fails to extract successful due to the inability to find specified entity types.	Unable to detect and extract entities.

Table 16. The impact of restricted error propagation at the atomic relationship level on accuracy.

Experimental Group	Impacted Relationships	Accuracy
Total	1705	-
Unmasked	0	100%
PerpendicularBisector-masked	16	99.06%
Parallel-masked	72	95.78%
Perpendicular-masked	123 (45)	92.79%

Among the 123 Perpendicular relationships, 45 mentioned a foot.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, Y.; Zhou, W.; Rao, Y.; Guan, H. A Knowledge and Semantic Fusion Method for Automatic Geometry Problem Understanding. Appl. Sci. 2025, 15, 3857. https://doi.org/10.3390/app15073857

AMA Style

Wang Y, Zhou W, Rao Y, Guan H. A Knowledge and Semantic Fusion Method for Automatic Geometry Problem Understanding. Applied Sciences. 2025; 15(7):3857. https://doi.org/10.3390/app15073857

Chicago/Turabian Style

Wang, Ying, Wei Zhou, Yongsheng Rao, and Hao Guan. 2025. "A Knowledge and Semantic Fusion Method for Automatic Geometry Problem Understanding" Applied Sciences 15, no. 7: 3857. https://doi.org/10.3390/app15073857

APA Style

Wang, Y., Zhou, W., Rao, Y., & Guan, H. (2025). A Knowledge and Semantic Fusion Method for Automatic Geometry Problem Understanding. Applied Sciences, 15(7), 3857. https://doi.org/10.3390/app15073857

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Knowledge and Semantic Fusion Method for Automatic Geometry Problem Understanding

Abstract

1. Introduction

2. Related Work

2.1. Machine-Learning-Based Methods

2.2. Rule/Template-Based Methods

3. Overview of the Research Framework

4. SGKO: Semantic-Enhanced Geometry Knowledge Ontology

4.1. The Upper Knowledge System Layer

4.2. The Lower Semantic Knowledge Layer

4.3. Model Scalability and Adaptability

5. Information Detect and Interpret Model

5.1. Text-Level ERE and IDIM-T

5.1.1. Relationship Mention Types

5.1.2. Entity Extraction

5.1.3. Relationship Extraction

5.1.4. Overall Processing Flow of IDIM-T

5.2. Knowledge-Level ERE and IDIM-K

5.3. Method Complexity Analysis

6. Experiment and Application

6.1. Experiment and Analysis

6.1.1. Self-Verification Test and Validity Verification

6.1.2. Application Verification Test and Error Analysis

6.2. Knowledge Graph Representation of Geometry Problem Semantic

7. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI