1. Introduction
The increasing complexity of the environment poses significant challenges in terms of object detection and necessitates shorter recognition times and greater accuracy for object detection. On specific target recognition platforms, target identification is usually based on detected target signal characteristics, matched with an established feature database for identification, and then, the target type is identified according to the differences in the target types of different platforms. For example, the classification and recognition method of non-cooperative objects based on Deep Learning (DL) is a target recognition method that uses DL to classify and recognize non-cooperative targets based on the micro-doppler effect and the laser coherent detection principle. The drawbacks of this method are as follows: (1) the feature database is generally extensive, leading to low matching and retrieval efficiency, affecting the response time, and (2) detection systems of different combat platform types are partially of the same type, and when multiple combat platforms are used for cooperative identification, contradictory results often occur when identifying target types. Therefore, to identify target types quickly and accurately, researching and exploring new theories and methods is of significant practical significance.
In response to the problems concerning the target recognition platforms mentioned above, data mining was conducted using the radiation source database. This study introduces an association rule analysis method for data mining technology to improve the speed and accuracy of target recognition. The algorithm proposed in this article extracts frequent itemsets from the feature database of the detection system, identifies various target recognition rules, and forms a target recognition rule database to achieve rapid search for target types. In doing so, the algorithm improves the speed and accuracy of target recognition. Association rule mining [
1], as an interdisciplinary field that integrates theories and techniques from databases, artificial intelligence, machine learning, statistics, and other domains, has become a research hotspot in database technology research and applications in recent years because of its ability to help decision-makers in various fields discover potential relationships among large database items. Among the many association rule mining algorithms proposed, the Apriori algorithm (AA) [
2], introduced in 1993 by R. Agrawal, Imielinski, and Swami, is the most famous. In recent years, there have been many developments in AA research. In 2022, Meng Chen and Zhi Xiang Yin utilized the algorithm for cardiotocography classification by integrating it with a multi-model ensemble classifier, showcasing its effectiveness in medical data analysis [
3]. Around the same time, Jian Zeng and Bao Jia applied the AA to real-time data mining and penalty decision-making in basketball games [
4], highlighting its versatility in sports analytics. In 2023, Chen Rumeng and colleagues developed a hypergraph-clustering method based on an improved AA [
5], demonstrating its ability to handle complex data structures.
Additionally, Fulin Li and his team used an enhanced version of the algorithm to mine equipment quality information, expanding its application in industrial sensor data analysis [
6]. That same year, Lin Wei Li and his team employed the optimized AA for deformation response analysis of landslide hazards [
7], making an innovative application in natural disaster research. As of 2024, the algorithm continues to evolve, with Dasgupta Sarbani and Saha Banan integrating it with DL for drug recommendations in big data environments [
8,
9,
10,
11,
12,
13], significantly contributing to its use in the healthcare sector. Furthermore, Xie R. and his team introduced a cognitively Confidence-Debiased Adversarial Fuzzy Apriori Method (CDAFAM) [
14], incorporating fuzzy logic and adversarial learning, bringing new perspectives on the use of the algorithm. Meanwhile, Yan Yiping applied an improved version of the AA to develop psychological crisis behavior models, extending its application to psychology and human–computer interactions. These developments have collectively advanced the AA, demonstrating its adaptability and increasing its relevance across diverse fields by addressing complex data analysis challenges. These latest research achievements enrich the study of the AA in theory and demonstrate its enormous potential and prospects in practical applications.
With the continuing growth of informationization in many industries, massive amounts of data have accumulated. Because of the large scale, inconsistent structure, and diverse sources of data, traditional data analysis methods have challenges in efficiently extracting valuable information. When there is a large amount of data, the complexity of the traditional AA exponentially increases and the running efficiency significantly decreases. The existing improved AAs all have varying degrees of defects, either generating fewer rules or generating too many useless, redundant iterations. Because of the fact that BV only contains 0 and 1, it can significantly improve computational speed, especially on large-scale datasets. The ant colony algorithm has the characteristics of parallel computing and positive feedback and has a fast convergence speed, meaning that it has strong advantages in terms of solving combinatorial optimization problems. This paper proposes a new association rule mining algorithm that introduces BV and the ant colony algorithm into the AA to effectively compensate for the shortcomings of the AA, which requires multiple database scans, and improve the quality and efficiency of association rule mining. In order to target data from different sources, this article adopts NBA professional arena data, disease and symptom data, and radar data for data mining. The results show that the new algorithm can not only mine more valuable information, such as the best lineup of NBA teams, but it can also identify data types faster and more accurately, such as judging diseases based on symptoms and identifying radiation source types based on radar features.
This study introduces the association rule analysis method for data mining to address the issues concerning the long recognition time and low accuracy in target recognition systems. By discretizing the target feature database to construct a BV and using an improved AA based on the BV and the ant colony algorithm [
15,
16,
17,
18,
19,
20], frequent itemsets in the target recognition feature database are extracted and used to form a new target recognition database. Constructing the database as a BV and utilizing the inner product of vectors can quickly identify frequent one-itemsets, satisfying the minimum support degree and eliminating miscellaneous items that clearly cannot form frequent itemsets. However, to obtain the maximum frequent itemset, it is still necessary to calculate line by line and continuously search until the end. Therefore, the proposed algorithm is combined with the ant colony algorithm, utilizing the excellent global search ability of the ant colony algorithm to find the most frequent itemset. Compared with previous algorithms, the proposed algorithm first eliminates some miscellaneous items and optimizes the database. Second, it only needs to scan the database once, significantly improving the efficiency and accuracy of the algorithm.
Section 2 of this study relates to the background knowledge needed to improve the algorithm, and
Section 3 proposes the fast update algorithm based on the BV inner product and ant colony algorithm.
Section 4 relates to the simulations and the analysis of the results, and this paper is finally concluded in
Section 5.
2. Background Knowledge
In this section, we explore several key components of association rule mining, including the basic concepts of association rules, the implementation of the Apriori algorithm (AA), the construction of a Boolean matrix (BM), the application of vector inner product, and the ant colony algorithm. The division of these topics into different subsections is intended to clearly articulate the specific methods and functions of each step. First, the definition of association rules and their key metrics provide a theoretical foundation for understanding subsequent algorithms. Next, the AA details how to mine frequent itemsets efficiently, a process that relies on the construction of a BM to represent data relationships effectively. The vector inner product offers an efficient method for performing calculations on the BM, thereby accelerating the discovery of frequent itemsets. Finally, the ant colony algorithm introduces a metaheuristic approach to further enhance the search for optimal frequent itemsets.
2.1. Association Rule
An association rule [
21,
22,
23,
24,
25] is a data mining method primarily used to discover interesting and frequent patterns, associations, or causal structures among different items in large datasets.
The main steps in association rule mining include the following:
(1) Generate frequent itemsets: Use algorithms to identify itemsets whose support exceeds the minimum support threshold.
(2) Generate association rules: Extract rules from the frequent itemsets that meet the minimum confidence threshold.
(3) Evaluate and filter Rules: Assess the usefulness of the rules based on metrics like confidence and lift and filter out the most valuable rules.
Association rule analysis focuses on the following key metrics:
Support: The support of a rule is defined as the proportion of data items in the dataset that contain all the items in the rule.
Confidence: Confidence refers to the probability that
B occurs given that
A has occurred, used to measure the reliability of the rule.
Using these metrics, the strength and reliability of association rules can be evaluated. An item is a frequent item if its support is greater than or equal to a specified minimum support threshold. This indicates that the itemset appears frequently enough in the dataset.
Association rule mining has a wide range of practical applications, including market basket analysis, recommendation systems, medical data mining, and telecommunications fields. In the latest research, data mining has also been applied to control and disaster prevention, such as an intelligent decision support system for groundwater supply management and electromechanical infrastructure controls [
26] and enhancing flood risk mitigation by an advanced data-driven approach [
27].
2.2. Apriori Algorithm
Association rules can help uncover hidden relationships within data. We must rely on algorithms to automate mining these rules from large datasets effectively. In this regard, the AA plays a crucial role [
28,
29,
30,
31,
32]. A classic association rule learning algorithm uses an iterative approach to generate frequent itemsets and then extracts applicable association rules from them. Its fundamental idea is to reduce the search space for computing frequent itemsets through joining and pruning.
The entire execution process of the AA mainly includes the following steps:
(1) The algorithm calculates the occurrence frequency of all items and determines the frequent one-itemsets.
(2) The algorithm iteratively generates longer frequent itemsets using the joining and pruning steps based on the currently found frequent itemsets. This process continues until no longer frequent item sets can be found.
The advantage of the AA lies in its simplicity and ease of implementation. However, it also has some drawbacks, such as the need to scan the entire database during each iteration and the potentially time-consuming generation of many candidate itemsets when the dataset is large.
2.3. Boolean Matrix Construction
In data mining, a BM is an important data structure used to represent the presence or absence of relationships between items. Constructing a BM is foundational for association rule mining and frequent itemset mining tasks. The specific steps for constructing a BM are as follows:
Determine the matrix dimensions: First, determine the rows and columns of the BM. Rows typically represent the transactions in the dataset, while columns represent all possible items.
Fill the matrix: Next, iterate through each transaction in the dataset. For each item in the transaction, if the transaction contains the item, enter 1 in the corresponding matrix cell to indicate presence; otherwise, enter 0 to indicate absence.
This article uses disease symptoms and patient data as examples, where the process of constructing a BM is as follows:
Suppose there are three radar data transactions, with metrics including carrier frequency value, pulse width value, and amplitude value.
Transaction 1 (Patient 1): fever, fatigue, headache.
Transaction 2 (Patient 2): fever, fatigue, headache.
Transaction 3 (Patient 3): fever, cough.
Transaction 4 (Patient 4): asymptomatic
The obtained database is shown in
Table 1.
Construct the corresponding BM
R using transactional database data, as shown in
Figure 1.
2.4. Vector Inner Product
The Boolean vector (BV) inner product [
33,
34,
35,
36,
37] is calculated by performing a logical AND operation on the corresponding elements of two BVs and then summing the results. It is used to find frequent itemsets mainly because of its high computational efficiency, which accelerates data processing. The vector inner product operation quickly calculates the degree of association between itemsets. Especially in a BM, the vector inner product can quickly determine the cooccurrence count of two itemsets. The basic steps for using the vector inner product to find frequent itemsets are as follow:
Represent transactions: First, convert each transaction into an n-dimensional BV, where n is the number of all distinct items. If the transaction contains a specific item, the corresponding vector element is 1; otherwise, it is 0.
Compute the inner product: To calculate the two itemsets, perform an inner product calculation using their corresponding BV. The result of the vector inner product is the number of times these two itemsets appear together in all transactions. The specific calculation formula is
Identify frequent itemsets: Compare the co-occurrence count of each itemset with a user-defined minimum support threshold, which can determine the frequent itemsets. If the co-occurrence count of an itemset is greater than or equal to the minimum support threshold, then the itemset is considered frequent.
We use the BM constructed from the previous radar signal data in the following example:
The vector representation for Radar Signal 1 is
M = [1,0,1], and the vector representation for Radar Signal 2 is
B = [1,0,0]. The inner product of these two vectors is calculated as follows:
If this value is greater than or equal to the minimum support requirement, then the itemset {Radar Signal 1, Radar Signal 2} can be considered frequent.
The advantages of the inner product operation of the BV are mainly reflected in the following aspects:
Simplify complex problems: Through logical operations, complex problems can be decomposed into simple logical judgments, making the problem-solving process more intuitive and easier to understand.
Efficient processing: Because of the binary nature of Boolean operations, they exhibit extremely high efficiency in handling discrete quantities and logical problems, making them suitable for large-scale data processing and computation.
Widely used: Boolean algebra not only has important applications in computer science and electronic technology, such as defining Boolean values with integers in C language, where 0 represents false and non-zero represents true, but it also has extensive applications in fields such as graphics and cryptography, as seen with the use of Boolean operations in graphics processing software.
Logical clarity: The use of Boolean algebra makes logical relationships clearer and more concise, which helps improve problem-solving efficiency and accuracy.
2.5. Ant Colony Algorithm
The ant colony algorithm [
38] is a metaheuristic search algorithm inspired by the behavior of ants seeking food. It simulates the process of ants releasing pheromones and choosing paths while searching for food, using cooperation and information sharing to solve combinatorial optimization problems.
The basic idea of the algorithm is that multiple virtual ants randomly search the solution space and release pheromones based on the quality of the paths they find. Other ants prefer to choose paths with higher pheromone concentrations, reinforcing the advantageous paths. Over time, paths with high pheromone concentrations become more attractive to the ants, leading more ants to choose those paths and eventually causing the algorithm to converge to an optimal solution.
The basic steps of the ant colony algorithm are as follows:
(1) Initialization:
Initialization of pheromones: Initialize pheromone values on each path in the search space. In general, the pheromone values on all paths are the same. Pheromones represent the “goodness or badness” of a path, and the concentration of pheromones affects the probability of ants choosing a path.
Heuristic information: For certain problems (such as TSP), heuristic information (such as distance, cost, etc.) can be introduced to guide ants in choosing paths.
(2) Ant construction solution:
Each ant constructs a solution in the search space. Ants will determine their path based on the current concentration of pheromones and heuristic information such as path length. Ants tend to choose paths with higher concentrations of pheromones and better heuristic information.
(3) Calculate path fitness:
After the ant constructs the solution, calculate the fitness of each solution. The higher the fitness, the better the solution. Evaluate the solution (path) constructed by ants based on the objective function (such as the total path length).
(4) Update pheromone:
Local pheromone update: During the process in which ants construct solutions, the number of pheromones along the path gradually increases as the ants pass by. The usual rule for updating pheromones is that the higher the chosen path, the higher the concentration of pheromones.
Global pheromone update: After all ants complete their searches, update pheromones based on their concentrations. At this point, the pheromones on the path will be readjusted according to fitness. A good path will leave more pheromones, while a poor path will evaporate more fragrance.
(5) Volatile pheromones:
Pheromones will gradually evaporate over time. This volatilization process can prevent the search from falling into local optima too early, thereby increasing the global search capability.
(6) Iteration:
Through repeated iterations, ants constantly explore paths, gradually update pheromones, and converge to the optimal solution. After each iteration, the best solution is recorded and used to guide the next round of the search.
(7) Termination conditions:
When the stopping conditions are met (such as reaching the maximum number of iterations, finding a solution that is good enough, or the pheromone changes tend to stabilize), the algorithm terminates.
The AA searches for frequent itemsets as a global search process, while the ant colony algorithm can search for the global optimal path. Therefore, the improved algorithm transforms the association rule mining problem into the ant colony algorithm, solving the traveling salesman problem. Given that the fundamental element of the traveling salesman problem is the target city, all frequent items in the database are used as the target city for mapping, which is the problem space of the ant colony algorithm. In the traveling salesman problem, the optimal solution evaluation criterion for ants to traverse the graph once is to find the “shortest path.” In the improved algorithm, the optimal solution evaluation criterion is to find the frequent itemsets with the maximum support.
3. Proposed Algorithm
The Apriori algorithm (AA) adopts a layer-by-layer search iteration method, and its complexity is mainly manifested in the first step of accessing the transaction item set. The running efficiency is significantly reduced when the number of items is large. The existing improved AAs all have varying degrees of defects, generating fewer rules or too many useless, redundant iterations. Therefore, the proposed algorithm utilizes the construction of Boolean matrices (BMs) and the results of Boolean inner product operations to continuously search for the maximum frequent itemset of each row vector, reducing the number of scans. The ant colony algorithm is introduced into the new proposed algorithm. It has substantial advantages in solving combinatorial optimization problems because of its parallel computing, positive feedback characteristics, and fast convergence speed. The proposed algorithm can effectively compensate for the shortcomings of the AA and improve the quality and efficiency of association rule mining.
The AA searches for frequent itemsets as a global search process, while the ant colony algorithm can search for the global optimal path. Therefore, the proposed algorithm transforms the association rule mining problem into an ant colony algorithm, solving the TSP problem. Given that the fundamental element of the TSP problem is the target city, all frequent items in the database are used as the target city to establish a complete graph, which is the problem space of the ant colony algorithm. In the TSP problem, the optimal solution evaluation criterion for ants to traverse the complete graph once is to find a “shortest path”, while the optimal solution evaluation criterion in the proposed algorithm is to find the frequent itemset with “maximum support”.
3.1. The Apriori Algorithm Improved with Vector Inner Product Generates Frequent 1-Itemsets
One-itemsets are a part of frequent itemsets [
39,
40,
41] in data mining. In association rule mining, one-itemsets are frequent itemsets that only contain a single item. The proposed algorithm reconstructs the BM and adds rows to store intermediate computation results. Applying each row vector’s inner product with itself identifies frequent one-itemsets that meet the minimum support threshold. The algorithm is described as follows:
Step 1: Scan the transaction database to construct the BM D = , and arrange it according to the number of 1s in each row vector. The last row of matrix D is an all-zero row used to store the results of the previous step’s computations. Set the minimum support threshold to min_support.
Step 2: Calculate the inner products for the first n row vectors of the BM D. Delete the rows in D where the sum of the inner product is less than the min_support and identify the frequent one-itemsets.
Step 3: Output all frequent one-itemsets that meet the minimum support threshold.
3.2. Association Rule Mining Algorithm Based on the Ant Colony Algorithm
Next, an undirected graph with all frequent items in the database as locations is established, which represents the problem space for the ant colony algorithm. An ant traversing the graph once to find the optimal path is equivalent to finding the frequent itemset with the highest support. The steps include the following:
- (1)
Construct an undirected graph.
Step 1: Calculate the number n of all frequent items based on the proposed AA using Boolean vectors (BVs) and construct an undirected graph G with n vertices.
Step 2: Use all frequent one-itemsets as the vertices of this undirected graph G.
Step 3: Calculate the support of the itemsets formed by any two vertices in graph G and use the reciprocal of the support value as the distance between the two points.
Step 4: Use the constructed graph G as the problem space for the ant colony algorithm to mine association rules, as shown in
Figure 2.
A to
F in the figure represent vertices.
- (2)
Generate Frequent Itemsets.
Randomly select m ants to start from different vertices and create parameters.
and
represent the number of iterations and
represents the ant’s path. The initial value of
is 0, and the initial value of
is 1. For each ant,
k, calculate the next item
j to reach based on the following transition probability formula:
where
represents the pheromone concentration on the path from point
i to point
j at time
t. The value of
on each path is within the range
, and the initial value is set as
.
represents the heuristic value from point
i to point
j, which equals the reciprocal of the support value between the two vertices.
and
represent the influence of pheromone and heuristic information on the ant’s decision, respectively.
represents the next city that ant
k can choose.
List the selected city
j in the ant’s tabu list and determine whether the path, after adding
j, contains frequent itemsets, including item
j. If it does, extract the frequent itemset; otherwise, remove
j from the path. When
, increment
by 1, and continue to select nodes to make the ant traverse the entire path. When
, the ant completes traversing all paths and obtains the frequent item set with the maximum support using this cycle. The pheromone update rule is as follows:
where
represents the pheromone increment on the path from point
i to point
j. If the path from
i to
j does not belong to the optimal solution, its value is 0.
represents the pheromone residual coefficient.
If
< maximum number of iterations, increment
by 1, and the ants continue to traverse. When
> maximum number of iterations, output all frequent itemsets obtained from this cycle. The flowchart of this algorithm is shown in
Figure 3.
The pseudocode shown in
Figure 4 and
Figure 5 combines the AA for frequent itemset minimization with the ant colony optimization algorithm to solve an optimization problem. It first uses the BV inner product to remove frequent terms that cannot form a frequent itemset, identifies frequent itemsets from the dataset, and then uses a probabilistic approach to build potential solutions (paths) iteratively by selecting items and updating pheromones to guide future selections. The algorithm continues until a stopping condition (maximum iterations) is met, at which point, it evaluates the support and confidence of the discovered itemsets to generate association rules.
The proposed algorithm utilizes the BV to convert data into 0 and 1, reducing computational complexity. Through the use of the inner product, frequent items in the database are reduced, effectively reducing the vertices in the problem space used by the ant colony algorithm and improving its efficiency. Utilizing the advantage of only scanning the database once with an ant colony algorithm significantly reduces the time required for data mining.