3.3. RapidOCR Text Recognition and Cell Correspondence
Using the improved LORE to identify the four corner coordinates of each cell, when processing tables in images, it is first necessary to sort the coordinates of each cell to determine its row and column structure. Assuming the top-left and bottom-right coordinates of each cell in the table are
and
, respectively, the
of each cell is sorted from left to right. The coordinate set of the cell is
, which sorts
in ascending order to determine the position of the columns, as shown in Formula (1).
Similarly, the position of each cell’s
can be used to determine the position of the rows, sorting
in ascending order, as shown in Formula (2).
By sorting the and coordinates of all cells, the row and column positions of each cell in the table can be determined.
When RapidOCR processes an image, it outputs the four boundary points of each text area. The coordinates of these points are
,
,
, and
, forming the boundary box of the text area. To determine whether the text belongs to a specific cell, first the center point coordinates
of the text boundary box are calculated, as shown in Formula (3).
By comparing the center point of the text boundary box with the boundaries of the table cell, it can be determined whether the text is within a certain cell. Assuming that the coordinates of the top-left and bottom-right corners of a certain table cell are
and
, respectively, the text is considered to be within the cell if the center point
meets Formula (4).
To address the issue of slight boundary overflow of text, a tolerance,
, can be set for the judgment conditions. If the boundary of the text slightly crosses the cell boundary, it can also be considered to be within the cell. After setting the tolerance, the judgment conditions can be modified as shown in Formula (5).
In this paper, the tolerance
is set to 2 pixels. Text that spans multiple cells needs to be handled separately. This can be identified by detecting that the boundary of the text area exceeds the range of a single cell. If any point of the text boundary box has coordinates that exceed the cell range, it is determined to span multiple cells, as shown in Formula (6).
If this condition is met, it is necessary to extract the areas of the relevant multiple cells and treat them as a whole for processing. By performing OCR recognition again on these merged cell areas, ensure that text spanning multiple cells can be correctly recognized.
3.4. Device Name Semantic Similarity Matching Based on Improved Cuckoo Search Algorithm
Using the table extraction, structure recognition, and text recognition methods from the previous section, it is possible to transform the unstructured table data in the original engineering drawings into structured two-dimensional table data. Engineering drawings are divided into material summary tables and sub-tables. Taking a substation project as an example, the summary table includes the main transformer equipment, distribution equipment, and materials required for the project. Currently, manual matching methods are used. This paper proposes an intelligent verification method whose core lies in the semantic similarity matching of device names. Since the summary and sub-books are recorded by different people, there are differences in descriptive habits, hence the need for semantic similarity methods to perform semantic matching and verify the consistency of quantities.
Cosine Similarity: Similarity measurement refers to the calculation of the degree of similarity between individuals, generally measured by distance. The smaller the similarity value, the greater the distance; the larger the similarity value, the smaller the distance. The most common method to measure text similarity is to use cosine similarity. In space, the cosine of the angle between two vectors is used to measure the difference between two individuals. A cosine value close to 1, with an angle tending towards 0, indicates that the two vectors are more similar. A cosine value close to 0, with an angle tending towards 90, indicates that the two vectors are less similar. The calculation formula for cosine similarity is shown in Formula (7).
In the formula, and are two i-dimensional vectors, and represents the cosine similarity between vector and vector .
However, cosine similarity computed solely through word segmentation presents inherent limitations in handling abbreviations. For instance, in equipment nomenclature, the abbreviation “mushe” (short for “muxian shebei”, busbar equipment) yields a similarity score of 0 against its full form under word segmentation, whereas character-level segmentation achieves 0.7. This exposes critical gaps in text-processing pipelines for technical domains requiring granular semantic alignment.
When only using cosine similarity to match device names, the semantic features between device names are often ignored. In this situation, with the powerful function of large models, this paper introduces a semantic similarity matching method based on BERT [
34]. BERT is a pre-trained language model. It is based on the deep learning model Transformer with a self-attention mechanism [
35]. Therefore, BERT can effectively capture long-term dependencies in sequence data. By learning the relationship between words in the text, it can obtain rich semantic information. Since language models process input text on a word basis, the output of the language model is the vector representation of each word in the text. To obtain the vector representation of a sentence, pooling is required. Common pooling operations include cls, pooler, last layer average, and first-last layer average [
36]. This paper adopts first-last layer average pooling, as shown in
Figure 4; it calculates the average of all word vector representations from the first and last layers. For the pooled equipment name vectors, the cosine similarity method can be used to calculate the semantic similarity between them. The similarity calculated using the above method is denoted as
, which represents the semantic similarity between two i-dimensional vectors
and
.
Using cosine similarity or semantic similarity to calculate the match degree between two equipment names both have certain limitations. Therefore, this paper adopts a comprehensive scoring method to match the similarity between phrases, as shown in Formula (8),
where α and β are adjustment factors.
Adjustment factors are crucial for the accuracy of similarity calculations, so it is necessary to construct adjustment factors based on the characteristics of the text. Accordingly, this paper proposes an improved cuckoo search algorithm to optimize the adjustment factors in semantic similarity, making it more suitable for engineering drawings.
The cuckoo search algorithm is a population-based intelligent optimization algorithm [
37,
38], originating from the observation of cuckoo populations. Cuckoos choose a nest during the breeding process and compete based on the quality of the nest. Higher-quality nests attract more cuckoos, thereby increasing the success rate of reproduction. The algorithm simulates this process by iteratively updating the position of the nests to gradually optimize the quality of the solutions. At the start of the algorithm, a set of initial solutions is randomly generated as the positions of the nests. Then, based on the quality of the solutions and the attractiveness of the nests, cuckoos will choose a new position. Higher-quality solutions attract more cuckoos, while poorer solutions may be eliminated. By continuously iterating and updating the positions of the nests, the cuckoo search algorithm can gradually converge to the optimal solution.
Currently, the cuckoo search algorithm has been widely applied in various fields and has achieved remarkable results. Regarding the regional energy consumption quota allocation in China, the cuckoo search algorithm has optimized the weight calculation, which has notably enhanced the rationality and effectiveness of the allocation scheme, thus effectively promoting fairness and effectiveness [
39]. When it comes to the field of biomedical article retrieval for precision medicine clinical decision support, the application of the cuckoo search algorithm has successfully boosted the accuracy and efficiency of retrieval [
40]. In the area of solving the traveling salesman problem (TSP) and optimizing glass-cutting paths, the cuckoo search algorithm has been used to put forward effectively improved algorithms [
41].
The traditional cuckoo search algorithm uses a fixed step size to update solutions, which can lead to the algorithm easily falling into local optima when approaching local optimal solutions later on. Sine optimization is used for a broader range of global searches in the initial stage, avoiding falling into local optima by dynamically adjusting the search step size [
42]. The dynamic update strategy of the sine can be represented by Formula (9).
In the formula, is the position vector of the next generation of cuckoo individuals; is the position vector of the current cuckoo individual; is the position vector of the current best solution; is a random number controlling the step size, with a value in ; is an angle factor, usually randomly generated within .
This paper adds a random strategy known as the catfish effect to enhance its ability to escape local optima. The catfish effect simulates the introduction of new, randomly generated cuckoos (“catfish”) to stimulate the vitality of the population and prevent the algorithm from converging too early. This is achieved by periodically randomly updating the positions of some individual cuckoos, as shown in Formula (10),
where
is the position of a randomly updated cuckoo individual,
is a randomly generated new position;
is a random number between
; and
is the position of the
-th cuckoo individual.
The specific steps of the improved cuckoo search algorithm are as follows:
- (1)
Define the Problem Space
Firstly, it is necessary to define the search space of the problem, that is, the upper and lower bounds of the decision variables. Assume the problem has decision variables, and each variable takes values within the interval .
- (2)
Randomly Initialize the Population
The population of the cuckoo search algorithm consists of multiple nests, each representing a potential solution (i.e., a position). The initial population is usually generated through the following steps:
Determine the population size N, which is the number of nests.
For each nest (solution), randomly generate a solution vector within the problem’s defined domain.
Assume the solution space is -dimensional, and each solution can be represented as a -dimensional vector , where each is randomly generated within its corresponding range .
The formula is represented as shown in Formula (11),
where
is a uniformly distributed random number in the interval
, ensuring that the initial population is uniformly distributed throughout the solution space.
- (3)
Set Algorithm Parameters
After generating the initial population, several key parameters of the cuckoo search algorithm need to be set:
Discovery probability : This represents the probability that a host bird discovers a cuckoo egg, typically ranging between . This parameter determines the proportion of solutions that are replaced in the algorithm.
Maximum number of iterations: This sets the termination condition for the algorithm, usually the maximum number of iterations or the convergence precision of the objective function value.
- (4)
Evaluate the Initial Population
After generating the population, it is necessary to evaluate the solution corresponding to each nest, using the objective function
to calculate the fitness value of each solution. The fitness value is used to select the best solution or replace the poorer solutions in subsequent steps. The objective function is the accuracy rate of similarity matching, as shown in Formula (12),
where “Number of Correct Matches” is the number of correct similarity matches calculated, and “Total Number of Matches” is the total number of similarity matches calculated.
- (5)
Generation of New Solutions
In each iteration, the cuckoo search algorithm generates new solutions through random walks, utilizing the Lévy flight strategy to generate new candidate solutions [
43,
44]. The Lévy flight is a random process with small steps and long jumps, as shown in Formula (13),
where
represents the solution at generation
.
is a random number generated from the Lévy distribution, typically using
to maintain global search capability.
is the step size factor, which is replaced by sine optimization, thus updating Formula (13) to Formula (14).
For nests discovered by the discovery probability, two methods are used for random generation and local perturbation generation: random new solutions and the catfish effect.
After each round of iteration, the global best solution is updated based on the evaluated solutions. That is, among all nests, the solution with the best fitness value is selected as the current best solution.
- (6)
Check Termination Conditions and Output the Optimal Solution
The iteration process continues until the termination conditions are met. Common termination conditions include reaching the maximum number of iterations and the change in the global best solution being less than a preset threshold, indicating that the algorithm has converged. Finally, the optimal solution is output. The overall process of the improved Cuckoo Search algorithm is shown in
Figure 5.