1. Introduction
Distributed ledger technology (DLT) has transformed transaction management into decentralized systems by ensuring immutability without the need for central authorities. A distributed ledger functions as a decentralized database, maintaining and synchronizing data across multiple nodes. Unlike traditional centralized ledger systems, which depend on a single server or authority for data storage and management, distributed ledgers offer significant advantages over centralized systems, including enhanced security, greater transparency, and improved scalability [
1,
2,
3].
Distributed ledgers employ consensus techniques and cryptography to protect the confidentiality and integrity of the data kept on the ledger. By making it difficult for unauthorized parties to tamper with or change the data on the ledger, these cryptographic approaches ensure the data’s authenticity and dependability. Furthermore, distributed ledgers are frequently created to be transparent, allowing all nodes in the network to see the full history of transactions on the ledger and to access the same data. There are numerous potential uses for distributed ledgers, including managing supply chains, completing financial transactions, and verifying identities [
4,
5]. Recently, they have gained significant attention due to their ability to support secure and transparent digital transactions and enable new types of decentralized applications.
IOTA is a type of DLT that employs a directed acyclic graph (DAG) data structure known as the “Tangle” to store and handle transactions [
6], providing a lightweight and scalable alternative to traditional blockchains. IOTA was developed as a response to the scalability and energy consumption challenges faced by many blockchain-based systems, and it was designed to handle a large number of transactions per second without the need for complex and energy-intensive consensus mechanisms.
Figure 1 illustrates the three-layer architecture of the IOTA Tangle network. The network layer (Tangle) represents the actual directed acyclic graph structure where transactions are stored and linked. The node layer consists of full nodes that maintain a complete copy of the Tangle and have both read and write access. The IoT/sensor layer comprises lightweight nodes or clients that issue transactions to be processed by the network. This architecture enables IOTA to support offline transactions and operate in low power or resource-constrained environments, making it a potential solution for use cases such as IoT applications [
7].
The IOTA Tangle stands out by eliminating the need for miners or validators. Instead, each new transaction validates two previous ones, enabling parallel processing, enhanced scalability, and reduced transaction costs. This unique design is particularly advantageous for Internet of Things (IoT) applications, which require efficient handling of high volumes of microtransactions [
1,
2]. However, the Tangle’s performance heavily depends on the efficiency of its TSA, which determines which unapproved transactions (tips) are validated [
3,
4].
The Tangle enables transactions to gain acceptance with increasing confidence as they receive more approvals. The system does not impose any rules for transaction approval, but it is argued that a large number of nodes following a “reference” rule would be beneficial for network security [
8]. In order to initiate a transaction, a node must carry out the following steps: (1) utilize an algorithm to select two previous transactions to approve, and (2) verify that the two transactions are not conflicting before solving a cryptographic puzzle. The Tangle network is asynchronous, and nodes may have different sets of transactions, including conflicting ones. While consensus is not necessary for valid transactions, nodes must decide which transactions to orphan in case of conflicts.
Nodes use the TSA multiple times to determine which of the two conflicting transactions is more probable to be indirectly approved by the chosen tip. Based on this decision, the node can then proceed to approve the transaction. A transaction’s confirmation confidence is determined by the number of times it was selected during a set of TSA runs. It is noteworthy that the IOTA protocol operates differently from traditional blockchain systems, where a global consensus is necessary to determine the validity of transactions. Instead, the Tangle allows for a more flexible approach where nodes can independently verify the transaction history [
9,
10].
Two key challenges persist in the IOTA Tangle: orphan tips and lazy nodes. Orphan tips are unapproved transactions that degrade network throughput and delay confirmations [
5]. Lazy nodes exacerbate this issue by strategically approving older transactions, avoiding computational costs associated with validating newer ones. This behavior leads to imbalanced network loads and reduced efficiency, particularly under high transaction rates and dynamic network conditions [
11,
12]. Addressing these challenges is critical for the efficiency and fairness of the IOTA Tangle. While existing TSAs such as Random Walk, Weighted Walk, and unweighted approaches have shown potential in simulations, their performance under real-world conditions often suffers due to efficiency, robustness, and decentralization limitations. These algorithms struggle to adapt to varying network conditions and fail to adequately mitigate fairness issues such as the persistence of orphan tips and lazy behavior [
13,
14].
To address these challenges, this work introduces a POMDP-based TSA. The proposed algorithm dynamically adjusts its tip selection strategies by observing transaction states such as age, connectivity, and approval status. By incorporating POMDP’s reward-based decision-making framework, the algorithm prioritizes transactions with lower confirmation likelihoods, ensuring a fairer distribution of approvals across the network. Furthermore, this study evaluates the algorithm’s performance under various network scenarios, including changes in transaction rates, node densities, and network latencies. Comprehensive simulations compare the POMDP-based TSA against existing algorithms, demonstrating its superiority in reducing orphan tips and preventing lazy behavior. The results provide insights into optimizing the TSA for real-world deployment while contributing to the broader field of distributed ledger technologies. This work marks a significant step forward in improving the efficiency and fairness of the IOTA Tangle, offering a robust solution for scalable and equitable transaction processing in IoT-driven decentralized networks.
Contributions and Innovativeness of the Research
To highlight the novelty and contributions of our research, the following is the summarization of the key points:
Novel POMDP-Based Tip Selection Algorithm: We introduce a POMDP-based TSA for the IOTA Tangle network, addressing fairness and efficiency challenges in distributed ledger systems.
Mathematical Formulation and Theoretical Analysis: We rigorously define the problem as a POMDP, incorporating a reward structure that incentivizes fairer and more efficient tip selection.
Optimization of Computational Complexity: We assess the computational complexity of the proposed POMDP-based TSA through simulation time analysis, showing a moderate increase in computation time compared to lightweight algorithms. However, POMDP remains feasible for IoT applications, balancing efficiency, fairness, and security for decentralized networks.
Improved Transaction Confirmation Rate: Through comprehensive simulations, our TSA is shown to significantly reduce orphan transactions and minimize lazy tip selection, enhancing scalability.
Comparative Analysis with Existing Algorithms: Our work presents an in-depth experimental comparison against existing TSAs, showcasing superior performance under varying network conditions.
Practical Considerations and Real-World Applications: We discuss the applicability of our approach in real-world IOTA-based IoT networks and identify possible challenges for deployment.
Contribution to Reinforcement Learning in Blockchain Networks: By integrating reinforcement learning techniques into TSA selection, we demonstrate how adaptive algorithms can improve decentralized ledger efficiency.
These contributions establish our research as a significant advancement in optimizing the IOTA Tangle’s transaction selection process and reinforcing its potential for broader IoT applications.
The remainder of this paper is organized as follows.
Section 2 reviews the related work and provides a comprehensive background on the topic.
Section 3 outlines the methodology, including the formulation of the problem as a Markov decision process (MDP) and its representation in the IOTA environment as POMDPs.
Section 4 presents the simulation framework utilized in this study.
Section 5 discusses the evaluation and results, covering aspects such as fairness, orphan and lazy transactions, pending and confirmed transactions, as well as simulation time analysis. Finally,
Section 6 concludes the paper and suggests potential directions for future research.
2. Related Work
Different TSAs have been proposed in the literature for the IOTA Tangle network to improve transaction validation and network efficiency. These algorithms address critical challenges such as orphaned transactions and lazy tips, which undermine the system’s performance and fairness. For instance, the Random Walk algorithm (RWA) traverses the DAG structure to select tips for transaction validation. It employs random walkers starting from a known point, such as a genesis transaction, to discover unapproved transactions (tips) for validation. While simple, this approach may leave many transactions orphaned as it lacks mechanisms to prioritize high-priority transactions [
15]. Unweighted Random Walk (URW) combines random walks with equal probabilities for transaction selection. Unlike RWA, it relies on walkers for discovery, which improves tip selection randomness. However, like its predecessor, it suffers from approving lazy or irrelevant transactions [
16,
17]. Weighted Random Walk (WRW) improves upon URW by incorporating cumulative weights of transactions into the selection process. This ensures that transactions with higher network relevance are prioritized. By leveraging Markov Chain Monte Carlo (MCMC) probabilities, as shown in Equation (1), WRW provides a structured method for tip selection, reducing lazy and orphaned transactions:
where
Pxy is the probability of walking from the transaction
x to
y,
Hy is the cumulative weight of the transaction
y,
z~>x indicates all transactions
z that directly approve transaction
x, and
Hz represents the cumulative weight of all transactions that directly approve transaction
x. Thus, the higher the cumulative weight, the higher the probability of selection and approval [
18,
19].
The development of an efficient TSA remains a prominent area of research in the IOTA Tangle ecosystem. In recent years, a few proposals have been put forth, each aimed at addressing the fairness problem of orphan tips, selecting the lazy tips that are presented in existing TSA, and enhancing the overall performance of the Tangle network. Wang et al. [
20] proposed a dynamic switching selection strategy for tip selection in IOTA, called D-IOTA, which addressed both the anti-splitting attack and fairness issues (orphan transactions) in Tangle. The method considers sharpness and angle-fragment and selects the appropriate algorithm based on current sharpness to maintain the anti-splitting ability while improving the fairness and unpredictability of the Tangle. Through simulation experiments, the authors demonstrated that D-IOTA controls the growth of tips, ensuring network throughput. One limitation of this algorithm is that it may not be able to handle rapid changes in the network load. Additionally, it may not be as effective in high-congestion scenarios, as the balance between confirmation rate and network load may not be able to be maintained. Furthermore, the proposed TSA in [
21] addressed the fairness problem by relying on two algorithms—Best Approver Selection Method (BASM) and Best Tip Selection Method (BTSM). BASM selects the most promising tips based on their past confirmation history, by modeling the probability of confirmation for each tip using Bayesian statistical methods. On the other hand, BTSM uses a machine learning approach, taking into account past confirmation history, tip age, the number of transactions, and distance between tips to select the most promising tips. However, both algorithms rely on a centralized control strategy with the coordinator responsible for monitoring the network and selecting the best tips, which limits the scalability and makes the network vulnerable to attacks. Additionally, malicious actors can manipulate the cumulative weight of tips by attaching multiple transactions, leading to the selection of harmful tips.
Moreover, the hybrid selection algorithm proposed by M. Aghania [
22] attempted to tackle the issue of orphan and lazy transactions by adjusting the random walk parameter (α) in the algorithm. The algorithm includes two variations—Hybrid TSA-1 and Hybrid TSA-2. Hybrid TSA-1 changes the value of α for each new tip selection and every parent transaction during traversal, while Hybrid TSA-2 assigns the same α value to all tip transactions but changes the value for parent transactions recursively during traversal. The aim is to reduce orphan and lazy transactions by increasing the likelihood of confirming tips with different α values. However, it is important to note that while the algorithm shows promise in theory, its real-world performance may differ due to dynamic changes in the environment. Therefore, further research and experimentation are necessary to determine the feasibility and practicality of implementing the hybrid selection algorithm in real-world scenarios.
In contrast to previous research, G. Bu et al. [
23] proposed a new TSA that incorporated a confidence-based algorithm and mutual supervision mechanism. In contrast to the traditional IOTA, G-IOTA selects an additional left-behind tip to improve fairness and provide initial approval for honest transactions. This is accomplished through integrated incentives and mutual supervision mechanisms, and the algorithm is resilient to all known attacks, including the splitting attack. However, the additional tip selection mechanism leads to inefficient use of resources, as nodes need to allocate more processing power and memory to handle these traditional tips. However, critics of this approach proposed E-IOTA as a solution to address the potential defects in IOTA and G-IOTA [
24]. Instead of relying on the left-behind tip mechanism for transaction fairness, E-IOTA uses randomized combinations of different TSAs. This approach solves the fairness problem and ensures the unpredictability of the tangle in the event of a splitting attack with a high α value. Despite this, combining multiple TSAs with different α values and probabilities adds complexity to the tip selection process. This could lead to increased computational overhead and slower tip selection, particularly as the Tangle grows larger.
Alternatively, another study [
25] proposed, instead of random selection among different TSAs in the E-IOTA algorithm, an optimization of TSA, called DA-IOTA, to determine the optimal α using an approach to dynamically determine each WRW step by using the standard deviation of the approval instead of random selection. However, this approach considers a constant hyperparameter value equal to 5 in the α calculation that needs to be tuned for different scenarios or network conditions. The Resource Allocation Weighted Random Walk (RA-WRW) algorithm, developed by Vyas and Raiyani, enhances IOTA’s transaction processing by integrating resource allocation with weighted random walk strategies. This approach optimizes tip selection based on node resources and transaction weights, leading to improved execution time, CPU usage, network efficiency, and scalability. Additionally, RA-WRW incorporates sender authentication and transaction verification to ensure data integrity and security within the IOTA network [
26]. The study by Ferenczi and Bădică focused on optimizing the cumulative weight calculation (CWC) in the IOTA Tangle, a DAG-based distributed ledger. The authors proposed replacing the original Breadth-First Search (BFS) approach with the Depth-First Search (DFS) and Iterative Deepening Search (IDS) algorithms. Their comparative analysis demonstrated that DFS and IDS significantly enhance computational efficiency, which is particularly advantageous for IoT devices with limited processing capabilities. Their experimental results on a Tangle snapshot confirmed improved performance and reduced resource utilization with the proposed method [
19].
In [
27], Xun Xiao addressed the computational inefficiencies of the traditional random walk approach used for tip selection in DAG-based blockchains like IOTA. Under the condition of burst message arrivals, the random walk method, even when parallelized, can lead to significant delays. To mitigate this, Xiao introduced a novel approach inspired by absorbing Markov chain (AMC) theory. This method involves periodically calculating a tip selection probability distribution (TSPD) of the DAG ledger, allowing processing nodes to sample from the TSPD directly, thereby expediting the tip selection process. Theoretical complexity analyses and comparative evaluations with single- and multi-processing random walk schemes demonstrate the effectiveness of this solution in enhancing computational efficiency during high-volume message scenarios.
While the TSA approaches described above provide theoretical foundations for improving the IOTA Tangle, it is essential to examine their practical limitations, as revealed through recent experimental studies. Understanding these empirical shortcomings will highlight the need for our proposed POMDP-based approach.
Theoretical studies have extensively discussed the limitations of existing TSAs in the IOTA Tangle, particularly regarding computational inefficiency, fairness issues, and security vulnerabilities. Traditional TSAs, such as Uniform Random Tip Selection, Biased Random Walk, and Markov Chain Monte Carlo, have been analyzed through mathematical modeling and theoretical proofs. Prior studies have shown that these TSAs struggle with maintaining transaction fairness, preventing malicious attacks, and ensuring scalability in high-traffic networks. Theoretically, MCMC-based TSAs mitigate parasite chain attacks by leveraging weight-based selection probabilities, yet they remain susceptible to large-weight manipulations, which can distort the validation process. Similarly, Random TSA has been demonstrated to be insecure due to its susceptibility to denial-of-service (DoS) attacks, where attackers can flood the network with spam transactions to delay legitimate approvals. Additionally, fairness remains a challenge, as transactions selected for approval are often disproportionately influenced by network topology and arrival patterns rather than an equitable distribution across all pending transactions.
While these limitations have been rigorously explored in theoretical frameworks, recent experimental studies have provided empirical validation of their real-world impact. The authors in [
28] conducted an extensive simulation of IOTA’s transaction throughput and found that the number of confirmation times increased by 50% under high network congestion, demonstrating that traditional TSAs fail to scale efficiently in IoT applications. Furthermore, the authors in [
29] studied TSA behavior under burst message arrivals and concluded that random walk-based selection introduces significant processing delays, particularly in high-traffic conditions. Their findings suggested that alternative methods, such as AMC-based TSA models, could enhance processing time efficiency by up to 40% compared to the standard MCMC.
Security concerns have also been validated through real-world simulations. Brady et al. [
30] demonstrated that Random TSA is highly vulnerable to DoS attacks, as it does not incorporate risk assessment in tip selection. Their empirical analysis showed that re-attachment rates from DoS attacks significantly degrade network performance, causing prolonged transaction confirmation delays. Similarly, Li et al. [
31] assessed security vulnerabilities in IOTA and found that while MCMC-based TSAs can mitigate parasite chain attacks, they fail to prevent large-weight attacks, which can be exploited by adversaries to manipulate transaction validation.
Fairness remains a major limitation in TSA design, as highlighted by Chen et al., who introduced a time division-based TSA (TDTS) that prioritized tips based on temporal factors. Their experimental results demonstrated a 35% reduction in lazy transactions compared to MCMC-based TSAs. However, despite this improvement, TDTS lacks an adaptive fairness mechanism, which can lead to persistent tip selection biases. This underscores the need for probabilistic decision-making TSA models that dynamically adjust tip selection probabilities based on network conditions [
32].
Recent experimental studies have also explored novel TSA mechanisms to enhance transaction validation efficiency. Guo et al. [
33] proposed a graph-based IOTA Tangle generation algorithm (GraGR) that reduced memory consumption by 50%, demonstrating that traditional TSA models are computationally inefficient for large-scale transaction loads. Similarly, Khan et al. [
34] utilized a discrete-event simulator to evaluate TSA performance and found that random walk-based TSAs fail to maintain efficiency when transaction volumes increase, validating the need for probabilistic selection methods such as our POMDP-based TSA.
As IOTA transitions toward its next-generation architecture (IOTA 2.0), Sealey et al. [
35] analyzed the implications of removing the coordinator node and found that new TSA models will be required to maintain security and fairness in a fully decentralized environment. This aligns with our research, as our proposed POMDP-based TSA optimizes transaction selection by integrating probabilistic decision making, ensuring fairness, computational efficiency, and security enhancements.
In summary, while the limitations of traditional TSAs have been well documented in theoretical studies, recent experimental findings further reinforce their inefficiencies in real-world applications. The empirical evidence from these studies demonstrates the urgent need for adaptive TSA mechanisms capable of balancing fairness, security, and computational feasibility—an objective that our proposed POMDP-based TSA directly addresses.
These experimental findings underscore the critical gap in current TSA approaches: the inability to simultaneously optimize for fairness, security, and computational efficiency across diverse network conditions. To address these limitations, we propose a POMDP-based TSA that adaptively balances these competing objectives through probabilistic decision making, as detailed in the following section.
3. Methodology
This section outlines the details of the proposed POMDP-based TSA to optimize the transaction confirmation process in the IOTA Tangle.
3.1. Markov Decision Process (MDP)
MDPs serve as a fundamental framework within the realm of reinforcement learning, employed to conceptualize decision-making scenarios where the consequences of actions are influenced by both randomness and the agent’s choices. Positioned as a keystone of model-based reinforcement learning, MDPs rely on an environmental model with which the agent interacts [
36].
Within the context of MDPs, mathematical decision challenges are aimed at determining the most favorable sequence of actions within an environment characterized by unpredictability. This model [
37,
38] comprises four distinct groups of components: states (
S), actions (
A), transition probabilities (
P(
s′ |
s,
a)), and rewards (
R(
s,
a)). As shown in
Figure 2, the agent chooses an action and moves to the next state based on the current state at each sequence step. The agent can access rewards during this process, resulting in positive or negative gains.
In MDPs, solutions are determined by policies π(s), which define which action to take in each possible state. A policy creates a trajectory through the state space, with uncertainty in the resulting states due to probabilistic transitions. The primary objective in solving MDPs is to find the policy that maximizes the agent’s cumulative rewards.
To handle the time value of rewards, MDPs employ a discount factor γ (0 < γ < 1), prioritizing immediate rewards over future ones. This discount factor reflects the uncertainty associated with future rewards: a discount factor of 1 weights all rewards equally, while a factor of 0 considers only immediate rewards.
The policy’s utility
is the sum of rewards obtained along a random path from state s. This is calculated as the sum of rewards from the current state (
r0) and all future states (
r1,
r2, etc.), discounted by a factor
γ. This is represented in Equation (2):
where:
represents the total expected utility;
γ is the discount factor (0 < γ < 1);
is the rewards received at time step T.
Since the exact trajectory is uncertain due to stochastic transitions, the value function
represents the expected utility of following policy
π from that state
s, as defined in Equation (3):
where:
The state-action value function, or Q-value,
, represents the expected utility of taking action in
a in state
s and then following policy π thereafter, as defined in Equation (4):
where:
represents the expected value of taking action a from state s;
is the transition probability to state s′ from state s when taking action a;
represents the reward obtained from this transition.
The Bellman Equation (5) establishes a recursive relationship for the value function:
This equation expresses the value of state s under policy π in terms of the immediate reward and the discounted value of successor states. Note that π(s) is explicitly used to indicate the action selected by policy π in state s, clarifying how the value function relates to the policy being evaluated.
For optimal policies, the Bellman optimality equation states that the optimal value function
equals the maximum Q-value achievable from state
s, as in Equation (6):
where:
is the optimal value function for state s;
is the optimal Q-value for taking action a in state s;
indicates taking the maximum value over all possible actions.
This leads to the recursive formula for computing optimal values, as in Equation (7):
where:
To find the optimal policies, either value iteration equations can be used by applying Equation (8) iteratively or policy iteration can be used, which alternates between policy evaluation and policy improvement steps. Policy improvement selects actions that maximize Q-values:
where:
This process continues until convergence to an optimal policy that maximizes the expected utility across all states.
3.2. Partially Observable Markov Decision Process (POMDP)
In a completely observable Markov decision process (COMDP), the agent can observe the environment state directly and make decisions accordingly [
27]. Conversely, in a POMDP, the agent cannot directly observe the environment state but must maintain a belief or probability distribution over possible states based on observations [
27]. POMDPs are more complex than MDPs since the agent does not have access to the underlying states, but rather, it maintains a probability distribution over a set of states based on observations and observation probabilities. POMDPs [
39] consist of six essential components: states (
S), actions (
A), state transition probabilities (
T), rewards (
R), observations (
O), and observation probabilities (
Z). During each step, the agent is in state
s, and it takes action
a, which leads the environment to transition to state
s′ with probability
T(
s′/
s,
a). The agent then receives observation o based on the new state
s′ and action
a, the agent receives observation o with probability (
Z(
o|
s′,
a)), and reward
R(
s,
a) is obtained [
40,
41].
The POMCP (Partially Observable Monte Carlo Planning) algorithm utilizes simulation techniques to enhance efficiency and is an anytime planning algorithm for POMDPs that avoids worst-case scenarios. It navigates a belief tree where each node represents a belief state, and each edge signifies an action. By expanding the tree, the algorithm determines the optimal action at each time step. POMCP adopts a sparse tree structure to focus solely on promising branches of the belief tree, employing a heuristic to estimate the value of unexplored nodes, thereby swiftly pruning unpromising branches. This approach enhances efficiency and scalability, rendering it suitable for handling large-scale problems [
39].
POMCP is particularly beneficial for decentralized POMDP scenarios, where multiple agents possess individual observations and actions but share a common objective. It can be parallelized to search for the best joint action among all agents, accommodating communication constraints and limited coordination between agents. Successfully applied in resolving complex POMDP planning dilemmas for real-time autonomous vehicle operations, POMCP serves as a promising foundation for constructing a POMDP planner for challenges in IOTA environments [
20,
41].
In summary, while POMDPs serve as a mathematical model, POMCP represents a specific algorithm devised for solving POMDPs, tailored to address real-time decision-making challenges within partially observable environments through a tree-based approach.
3.3. System Model Representation
The proposed POMDP-based TSA optimizes the transaction confirmation process in the IOTA Tangle by intelligently selecting the most suitable tips to approve. The system is designed to operate in a decentralized environment where the full state of the Tangle is not always observable by the agent (node) as shown in
Figure 3. The system works as follows:
Transaction Initiation: A new transaction is introduced into the IOTA Tangle. This transaction aims to become part of the network by selecting two tips (unconfirmed transactions) to approve.
Proof of Work (PoW): The transaction performs a proof-of-work (PoW) to prove its validity and prevent spam in the network. Once the PoW is complete, the transaction is eligible to request tip candidates for approval.
Request for Tip Selection: The transaction sends a request to the POMDP-based TSA to identify the most suitable tips for approval. Since the full state of the Tangle is not fully observable, the system must rely on partial information (observations) to make a decision.
Belief State Update and Tip Selection: The POMDP-based TSA processes the request by updating its belief state—a probabilistic representation of the Tangle’s current state, including transaction age, confirmation status, the number of approvers, and the overall health of the network.
Based on these observations, the TSA evaluates the available actions, which include selecting newer tips, older pending tips, or orphaned transactions.
The reward function is used to guide the tip selection process by rewarding actions that maximize confirmation rates and penalizing actions that select lazy or increase the orphaned tips. The system seeks to optimize rewards by balancing fairness, reducing confirmation delays, and maintaining network integrity.
Returning Optimal Tips: The POMDP-based TSA provides the transaction with the most appropriate candidate tips for approval. Along with the selected tips, it sends updated observations, the current reward, and the revised belief state to the transaction.
Tip Approval and Network Broadcast: The transaction approves the selected tips and is then broadcast to the network. Other nodes in the IOTA Tangle will receive and validate this transaction based on its approved tips.
Network Validation and Belief State Adjustment: As other nodes validate the new transaction, the POMDP-based TSA updates its belief state based on the network’s feedback. The system continuously learns from each transaction, refining its understanding of the Tangle’s dynamics and improving future tip selections.
Iterative Process: This process is repeated for each incoming transaction, ensuring that the POMDP-based TSA consistently adapts and optimizes its tip selection strategy over time.
It is important to clarify that while our system operates in a multi-agent environment with numerous nodes participating in the IOTA Tangle network, the POMDP framework is applied individually by each agent rather than as a collective decision-making process. Each node independently employs its own POMDP model to make tip selection decisions based on its local observations of the Tangle. This approach differs from a decentralized POMDP (Dec-POMDP) model, which would involve coordinated decision making among multiple agents.
In our model, each agent maintains its own belief state about the network, makes observations based on its local view of the Tangle, and selects tips according to its individual POMDP policy. While agents operate independently, their collective actions contribute to the overall consensus and health of the network. This decentralized yet independent approach aligns with IOTA’s philosophy of distributed consensus without requiring explicit coordination mechanisms between nodes. The effectiveness of our approach stems from how these independent POMDP-based decisions, when made by multiple agents across the network, collectively result in improved transaction confirmation patterns and reduced orphan transactions.
3.4. Representing the IOTA Environment as POMDPs
The methodology of formulating the IOTA tangle as a POMDP includes the six essential components of a POMDP and their definitions for the IOTA environment. We also describe the POMDP algorithm that we used to find an optimal policy that maps belief states to actions. By using a POMDP framework, we can model the IOTA environment in a way that allows us to make decisions based on a probability distribution over all states. Our proposed method can provide a promising solution for improving the security and efficiency of the IOTA tangle.
To represent the IOTA environment as a POMDP from the perspective of an individual agent (node), several components are defined as follows and as shown in
Figure 4:
States (S): The state represents the current status of the IOTA Tangle and includes information on pending, orphaned, lazy, and confirmed transactions. In our implementation, we represented the state as a vector s = [s1, s2, …, sn], capturing transaction age and approval status.
Actions (A): Actions represent the decisions made by the agent (node) about which tips to approve in the Tangle.
Observations (O): Since the agent does not have full access to the entire state, it receives partial information through observations.
Transition Function (T(S′|s,a)): The transition function defines the probability of moving from one transaction state (status) s to another transaction state (status) s′ after taking action a.
Reward Function (R): The reward function gives feedback on the quality of the agent’s action. The agent aims to maximize cumulative rewards.
Approving New Tips: +10 reward for increasing network activity and validation speed.
Addressing Orphan Tips: +5 reward for handling unconfirmed orphan tips.
Avoiding Lazy Tips: −3 penalty to discourage the selection of tips that lead to network inefficiencies.
This reward structure is implemented as R(s,a) = 10 ∗ Inew(s,a) + 5 ∗ Iorphan(s,a) − 3 ∗ Ilazy(s,a), where Inew, Iorphan, and Ilazy are indicator functions that identify the type of tip being selected.
- 6.
Belief State (B): The belief state is a probability distribution over possible states. Since the agent does not have full observability, it maintains a belief about the current state based on previous observations and actions.
- 7.
Policy (π): The policy π(a|b) defines the strategy for selecting actions based on the current belief state. The goal is to choose actions that maximize long-term rewards. The policy is derived from the value function defined in Equation (3), this is approximated using POMCP Algorithm 1.
POMCP is an efficient approximation method designed to solve POMDPs [
16,
17]. Algorithm 1 leverages a generative model-based sampling technique to estimate history-action values while maintaining computational feasibility. Instead of constructing the entire belief tree, POMCP constructs a limited portion of the tree of future histories starting from the current history
ht, applying an upper confidence bounds for trees (UCT) method to guide exploration and exploitation efficiently. The algorithm implements the theoretical foundation of MDPs and POMDPs presented in Equations (2)–(8), applying these concepts to the practical problem of tip selection through the following steps:
Initialization: Begin with the current belief state B(s) representing a probability distribution over possible Tangle states.
Simulation Phase: Perform M simulations where each simulation samples an initial state s ~ B(s) from the belief distribution and calls the SIMULATE function to estimate action values.
Action Selection: During simulation, actions are selected using SOFTUCT, a variation of UCT that balances exploration and exploitation. This implements the exploration–exploitation balance needed to approximate the optimal policy πopt(s) described in Equation (8).
State Transition: After selecting action a, sample the next state s′ and observation o: s′ ~ T(s′|s,a) o ~ Z(o|s′,a) This step models the transition probability P(s′|s,a) from Equation (4).
Belief Update: Update the belief state using Bayes’ rule. This implements the partially observable aspect of the MDP, extending the value function concept from Equation (3).
Value Estimation: For new histories not in the tree, use a rollout policy to estimate values. For existing histories, recursively simulate future actions. This process approximates the value function Vπ(s) from Equation (3) by estimating the expected sum of discounted future rewards.
Q-value Update: After each simulation, update the Q-value. This incrementally improves the estimate of Qπ(s,a) from Equation (4).
Final Selection: After all simulations, return the action with the highest Q-value. This implements the optimal policy selection principle from Equation (8), maximizing expected utility.
This algorithm efficiently approximates optimal tip selection strategies by focusing computational resources on promising actions while maintaining a balance between exploring new possibilities and exploiting known good strategies, directly applying the mathematical principles established in our POMDP formulation.
Algorithm 1. Partially Observable Monte Carlo Planning |
#POMCP consists of three main procedures: SEARCH, SIMULATE, and ROLLOUT |
procedure SEARCH(l, p, n) |
Initialize T with empty dictionary # T stores state-action visit counts and Q-values |
for SIMULATIONS = 1, …, n do |
# Sample an initial state from belief distribution |
if lp = o0 then |
s ~ B0 # Sample from the initial belief B0 if it’s the first step |
else s ~ B(lp) # Otherwise, sample from the belief given the history lp |
end if |
SIMULATE(s, l, p) # Run a simulation from the sampled state |
end for |
return a SOFTUCT # Return the best action selected by SOFTUCT |
end procedure |
procedure ROLLPUT(s, l, p) |
if γ(lp, l) ≤ 0 then |
return 0 # If the discount factor is too small, return zero reward |
end if |
a πrollout(l) # Select an action using the rollout policy |
(s′, o, r) ~ G(s,a) # Sample next state, observation, reward from transition model G |
l ← {l, a, o} # Update history with the new action and observation return γ(lp, l) + ROLLOUT (s′, l, p) # Compute discounted future rewards recursively |
end procedure |
procedure SIMULATE(s, l, p) |
if γ(lp, l) ≤ 0 then |
return 0 # If discount factor makes future rewards negligible, stop recursion |
end if |
if l ∉ T then # If this history has not been visited before, initialize it in the tree |
for all a ∈ A do |
T(la) ← (Ninit (l, a), Q′init(a, l), ∅) # Initialize visit counts and Q-values |
end for |
return ROLLOUT(s, l, p) # Perform a rollout to estimate valu |
end if |
a ~ SOFTUCT # Select action using Softmax-UCB Tree Policy |
(s′, o, r) ~ G(s,a) #Sample next state, observation, and reward from transition model G |
l ← {l, a, o} # Update history with new action and observation |
R ←γ(lp, l)*r + SIMULATE(s′, l, p) # Recursively compute discounted reward |
N(l) ← N(l) + 1 # Update visit count for current state |
N(l, a) ← N(l, a) + 1 # Update visit count for action a at state l |
Q′(a, l) ← Q′(a, l) + (R-Q′(a, l))/N(l, a) # Update Q-value estimate using incremental mean |
return R # Return the computed reward |
end procedure |
4. Simulation Framework
Building upon Ferenczi, A.’s framework for simulating the IOTA Tangle [
19], this work enhances the multi-agent simulation capabilities to provide a more detailed examination of the Tangle’s dynamics under various network conditions and introduces new TSA. The simulation is implemented in Python (3.13.1) due to its robust ecosystem and suitability for scientific computing. Python is an open-source, high-level programming language, making it widely accessible and cost-effective for research applications. Its extensive libraries, such as NumPy, SciPy, and NetworkX, provide powerful tools for numerical computing, graph processing, and reinforcement learning, which are essential for modeling and simulating the IOTA Tangle.
Furthermore, Python provides native support for POMCP solvers, which are essential for efficiently solving large-scale POMDP problems. Libraries such as POMDPy and AI Planning toolkits facilitate Monte Carlo tree search (MCTS)-based implementations, ensuring optimal decision-making in uncertain environments. This built-in support significantly simplifies the development of the POMDP-based TSA while maintaining computational efficiency. Python’s integration with machine learning and optimization frameworks, such as TensorFlow and PyTorch, allows for future extensions of the algorithm. Its readability and ease of use facilitate rapid prototyping and iterative development, ensuring efficient implementation and testing. The use of Python also enhances reproducibility, as its cross-platform compatibility and open-source nature enable researchers and developers to modify and extend the implementation without proprietary restrictions. These advantages make Python an ideal choice for implementing and evaluating the POMDP-based TSA in this study. The core components of the simulation include:
Environment Configuration: The simulation environment is configured to replicate the IOTA Tangle’s structure, allowing for the modeling of transactions and their interconnections.
Agent Modeling: Multiple agents are simulated to represent nodes within the network, each capable of generating transactions, selecting tips, and validating other transactions.
Network Conditions: Various network scenarios are simulated, including different transaction arrival rates, network delays, and node behaviors, to assess the performance and robustness of the Tangle under diverse conditions.
To effectively conduct simulations and analyze the behavior of the Tangle under various scenarios, several key parameters are defined as per
Table 1.
Table 1 summarizes the key parameters governing our simulation experiments. These parameters play pivotal roles in shaping the behavior and performance of TSAs within the tangle-based network. Notable, the number of transactions, transaction arrival rate, number of participating agents, and latency directly influence the network’s dynamics, throughput, and congestion levels. The α parameter, specific to certain algorithms, introduces a degree of variability. The choice of TSA is fundamental, as it dictates the underlying logic governing transaction confirmation and network consensus.
The simulation framework underwent significant modifications to accommodate several advancements that focused on multi-agent environments, transaction dynamics, and adaptive decision-making. The key enhancements include a dynamic transaction arrival system using exponential distribution, enabling realistic simulation of transaction flows, and agent-specific exit probability calculations, which account for unique network views of each agent. The transaction class was updated to include attributes like status, connectivity, and lazy indicators, allowing for comprehensive monitoring of transaction states. The agent class now tracks transaction histories and visibility with a directed graph, improving decision-making capabilities.
Among the TSAs implemented, the Random Selection and Unweighted MCMC algorithms were enhanced to address lazy and orphaned tips, while Weighted MCMC introduced dynamic thresholds to adapt to changing network conditions. Novel hybrid approaches, such as Hybrid TSA-1 and Hybrid TSA-2, optimized cumulative weight calculations to balance fairness and computational efficiency. The G-IOTA and E-IOTA algorithms further advanced tip selection by incorporating weighted random walks and confidence-based metrics. G-IOTA targeted fairness by including left-behind tips, whereas E-IOTA dynamically adjusted selection strategies based on real-time network conditions.
The core of this work is the development of POMDP_WALK, a modular POMDP-based TSA. Components like POMCPSolver, SimpleModel, and POMDP_components work together to optimize tip selection dynamically. The POMCPSolver uses tree-based search and UCT scoring to evaluate actions under uncertainty, while the SimpleModel bridges the algorithm and the simulation, facilitating adaptive and efficient tip selection.
The aggregation of these components ensures a robust framework for addressing fairness and efficiency challenges in distributed ledger systems, demonstrating scalability and adaptability in dynamic network conditions. This comprehensive suite of TSAs represents a significant step forward in achieving fairness-driven optimization in decentralized environments.
All simulations and experiments were conducted on a laptop equipped with an 11th Gen Intel®® Core™ i7-1165G7 processor (2.80 GHz) from local Qatar stores, 16 GB RAM, and a 64-bit Windows 11 Pro operating system. The computational framework was implemented in Python 3.9, utilizing scientific computing and machine learning libraries such as NumPy, SciPy, NetworkX, and POMDPy to support Monte Carlo tree search (MCTS) and reinforcement learning simulations. This hardware setup ensured efficient execution of large-scale graph-based computations and optimization processes in the IOTA Tangle while maintaining a balance between computational cost and scalability.
5. Evaluation and Results
The evaluation of the proposed POMDP-based TSA focused on two key metrics: orphan tip proportions and lazy tip proportions. These metrics are critical for assessing fairness and efficiency in the IOTA Tangle. The results demonstrate the algorithm’s ability to minimize unfairness by effectively addressing orphaned and lazy transactions across varying network conditions, ensuring equitable transaction approvals and robust network performance. The findings provide insights into the algorithm’s adaptability, scalability, and potential to enhance the overall efficiency of decentralized ledger systems.
5.1. Performance Analysis
This section evaluates the impact of key network parameters on the performance of the POMDP-based tip selection algorithm (TSA), focusing on confirmation rate, orphan tip reduction, and network efficiency. The analysis includes:
Number of Transactions: Assesses how increasing transaction volume affects confirmation time and orphan rate.
Number of Agents: Examines the influence of active agents on validation speed and tip selection consistency.
Arrival Rate: Evaluates the effect of transaction arrival rates on network congestion and throughput.
Bias in Tip Selection: Analyzes how selection preferences impact fairness, processing efficiency, and network balance.
Network Latency: Studies the impact of propagation delays on transaction confirmation and system stability.
These factors determine the scalability, efficiency, and fairness of the proposed TSA under varying network conditions.
5.1.1. Impact of Number of Transactions on Performance
The performance of various TSAs under increasing transaction loads provides valuable insights into their scalability and efficiency. As depicted in
Figure 5, the response of each algorithm to increasing transaction volume—from 100 to 1000 transactions—with a lambda value of 1 and 5 agents, demonstrates the strengths and weaknesses of each approach in handling the rising complexity of the network. By examining the number of orphan transactions, lazy selection tips, pending tips, confirmed transactions, and unchanged transactions, we can assess the overall efficiency of each algorithm under different conditions.
One of the most prominent trends in the analysis is the significant rise in the number of unchanged transactions across most of the algorithms as the number of transactions increases. Unchanged transactions (gray in the figure) represent tips that have not transitioned to a confirmed state. A higher count of unchanged transactions is a critical weakness, as it indicates inefficiencies in the confirmation process. Unweighted, Weighted, Hybrid TSA 1, Hybrid TSA 2, and G-IOTA are particularly susceptible to this issue, with a sharp rise in unchanged transactions as the transaction load increases. These algorithms struggle to process and confirm transactions effectively, suggesting that these algorithms may face serious performance bottlenecks under higher loads. This poses a scalability concern, as the accumulation of unconfirmed tips can lead to network congestion and decreased throughput.
POMDP, E-IOTA, and Random TSAs exhibit better performance in terms of confirmed transactions, indicating a more balanced throughput and confirmation rate. POMDP, in particular, maintains a relatively high number of confirmed transactions (green) as the number of transactions increases and a high number of pending transactions, which are semi-confirmed transactions (dark green). This suggests that POMDP is capable of processing and confirming transactions more effectively compared to other algorithms, particularly under medium and high transaction loads. The ability to minimize orphan and lazy tips while maintaining an increasing number of confirmed transactions is a key strength of POMDP, highlighting its potential for use in more demanding, high-load network conditions.
Remarkably, the number of lazy tips (blue) is relatively low for most algorithms, except for Unweighted and Hybrid TSA 1, where the number of lazy tips increases substantially with higher transaction counts. This implies that these algorithms are more prone to leaving tips idle, which can further compound issues related to confirmation delay. Unweighted TSA, in particular, sees a dramatic rise in the number of lazy tips at transaction loads of 1000, indicating poor transaction prioritization under stress.
In terms of pending tips (green), POMDP, E-IOTA, and Random TSAs display a positive trend, showing that a higher proportion of tips are in the process of confirmation. This demonstrates the effectiveness of these algorithms in actively processing tips rather than leaving them idle or orphaned. However, the E-IOTA and Random algorithms also show a significant number of pending tips, which indicates that while they can push tips toward confirmation, they are not as efficient in finalizing the confirmation process as POMDP.
5.1.2. Impact of Number of Agents on Performance
This section provides a detailed analysis of the performance of various TSAs as the number of agents increases in the network, with a focus on a network with 200 transactions and a lambda value of 1 (representing transactions per second). The number of agents is set to 2, 10, and 20 to test the algorithms’ ability to scale and handle a larger number of transactions and agent activity. The key metrics assessed include orphan transactions, pending tips, confirmed transactions, lazy tip selections, and unchanged transactions, offering a comprehensive view of how each TSA responds to increased network size and load. As shown in
Figure 6, across different network sizes, POMDP consistently outperforms other TSAs in transaction confirmation, scalability, and efficiency.
In the 2-agent setup, it confirms 23 transactions with only one orphan, while other algorithms, including G-IOTA, Unweighted, Weighted, Hybrid TSAs, E-IOTA, and Random, struggle to confirm any transactions. With 10 agents, POMDP maintains its lead, confirming 14 transactions and orphaning only one, whereas most other TSAs continue to exhibit poor performance, failing to adapt to the increased network activity. At 20 agents, POMDP further solidifies its efficiency by confirming 16 transactions and orphaning 5, outperforming all other algorithms, which display scalability issues, high orphan rates, and numerous pending or lazy tips.
5.1.3. Impact of Arrival Rate on Performance
In this analysis, we examine how different lambda values, representing the transaction throughput (transactions per second), impact TSA performance in a network with 200 total transactions and 5 agents. The metrics considered include orphan transactions, pending tips, confirmed transactions, unchanged transactions, and lazy tip selections. Each lambda value (0.5, 5.0, and 20.0) corresponds to a different level of network load, and the algorithms’ performance is evaluated in terms of fairness, efficiency, and scalability.
Across varying throughput levels (λ), the POMDP algorithm consistently demonstrates superior performance compared to other TSAs, highlighting its scalability and adaptability, as shown in
Figure 7. At low throughput (λ = 0.5), most algorithms, including G-IOTA, Unweighted, Weighted, and Hybrid TSAs, showed inactivity, while POMDP confirms 18 transactions with only 2 orphans, and Random TSA displays moderate efficiency. As throughput increases to λ = 5.0, G-IOTA and Hybrid TSA-1 begin to struggle, while POMDP maintains its lead by confirming 16 transactions with 4 orphans, outperforming E-IOTA and Random, which confirm just 1 transaction each. At high throughput (λ = 20.0), performance degradation is widespread, with G-IOTA, Weighted, Unweighted, and both Hybrid TSAs orphaning a large number of transactions and confirming few. POMDP, however, continues to excel, confirming 14 transactions with 7 orphans, and maintaining lower pending and lazy tip rates compared to Random and other algorithms. Overall, POMDP proves to be the most robust and efficient TSA across all throughput levels, consistently outperforming its counterparts under both light and heavy network loads.
5.1.4. Impact of Bias Tip Selection on Performance
This section analyzes the impact of varying alpha values (0.1, 0.5, and 1.0) on the performance of various TSAs in a simulated environment with 500 transactions, 5 agents, and a lambda value of 1. The metrics analyzed include orphan transactions, pending tips, confirmed transactions, lazy tips, and total unchanged IDs. These metrics reveal how each algorithm behaves as the bias toward newer or older transactions is adjusted by the alpha value, as shown in
Figure 8. Across the range of alpha values (0.1, 0.5, and 1.0), most algorithms exhibit varying degrees of efficiency and fairness. POMDP consistently demonstrates strong performance, adapting well to both low and high biases toward newer transactions and maintaining a high number of confirmed transactions with minimal orphan and lazy tips.
Random and Unweighted show gradual improvements in efficiency, although they still struggle with pending tips or laziness. Hybrid TSA-1 and TSA-2 are the most impacted by changes in alpha, with persistent issues of laziness and orphan transactions.
Ultimately, the selection of the alpha parameter plays a critical role in balancing fairness and efficiency in TSA performance, with POMDP emerging as the most robust algorithm across all alpha values, demonstrating consistent results regardless of the transaction bias.
5.1.5. Impact of Network Latency on Performance
In this analysis, a simulation was conducted using a network of 200 transactions, 5 agents, and a throughput rate (lambda) of 1 transaction per second. The goal was to evaluate the performance of different tip selection algorithms (TSAs) under varying network latency conditions. The primary metrics examined included orphan transactions, lazy tip selection count, pending tips, confirmed transactions, and total unchanged IDs. The algorithms tested in the simulation included G-IOTA, E-IOTA, Random TSA, Unweighted TSA, POMDP, Weighted TSA, Hybrid TSA-1, and Hybrid TSA-2. Latency was varied across three levels: 0.1, 1, and 5, to assess the effect of network delay on TSA performance.
Based on the experimental results shown in
Figure 9, POMDP demonstrates superior performance in active transaction processing while maintaining minimal issues across different latency conditions. The left graph, depicting active processing (confirmed and pending transactions combined), shows that POMDP consistently maintains the highest level of transaction activity, processing between 94 and 61 transactions across various latency values (0.1–5), with a peak performance at latency 1, where it handles 89 active transactions (22 confirmed and 67 pending). This significantly outperforms other algorithms, with the next best performer, E-IOTA, managing only 37–45 active transactions. Importantly, as shown in the right graph, POMDP achieves this high throughput while maintaining minimal issues (orphan and lazy tip selections), with only 1 orphan transaction consistently across all latency values and no lazy tip selections. This contrasts with algorithms like Hybrid TSA-1, which shows significant issues particularly at low latency (34 combined issue transactions). These results suggest that POMDP effectively balances active transaction processing with network stability, making it particularly suitable for dynamic network conditions.
5.2. Simulation Time Analysis
To evaluate the computational efficiency of various TSAs, we analyzed their simulation time across different lambda values (0.05 to 20).
Figure 10 illustrates the simulation time trends for the G-IOTA, E-IOTA, Random, Unweighted, POMDP, Weighted, Hybrid TSA-1, and Hybrid TSA-2 algorithms. The results highlight the trade-offs between computational efficiency and decision-making complexity in these TSAs.
The findings reveal that the Random and E-IOTA algorithms are the most computationally efficient, with simulation times consistently below 0.6 s for Random TSA and under 3 s for E-IOTA. These lightweight computational approaches are particularly beneficial in IoT environments, where energy efficiency and processing speed are crucial due to limited device capabilities. However, their simplicity may sacrifice fairness and tip selection accuracy, which can be problematic in decentralized IoT networks that require secure and balanced transaction propagation.
By contrast, the POMDP TSA algorithm maintains a moderate simulation time, reaching approximately 2.9 s at λ = 20. While this is slightly higher than some alternatives, the trade-off is justified by POMDP’s intelligent decision-making process, which enhances network fairness, reduces orphan transactions, and minimizes lazy tip selection. These factors are critical for IoT networks, where transaction reliability and load balancing are key to ensuring smooth data transmission across interconnected devices. Furthermore, the simulation results demonstrate that POMDP’s computational requirements remain within a feasible range for IoT devices, making it a viable choice for scenarios where security and fairness outweigh minimal differences in processing time.
A notable trend is Weighted TSA’s behavior, which records the highest simulation time (3.55 s at λ = 0.05) due to the computational overhead of its weighting mechanism. However, as lambda increases, its simulation time decreases, indicating that its computational demand reduces under higher network loads. This suggests that algorithms with pre-processing overhead may not be ideal for low-load IoT environments where efficiency is a priority.
Other algorithms, including G-IOTA, Unweighted TSA, Hybrid TSA-1, and Hybrid TSA-2, maintain relatively stable and moderate simulation times between 2.3 and 2.5 s. This demonstrates that these algorithms strike a balance between computational feasibility and structured tip selection, making them suitable for IoT networks where moderate computing resources are available.
Theoretical Complexity Analysis
Beyond empirical measurements, we analyzed the theoretical computational complexity of each algorithm. For our POMDP-based TSA, the time complexity can be expressed as O(M(H + log N)), where:
- -
N is the number of transactions in the Tangle.
- -
H is the planning horizon (depth of lookahead search).
- -
M is the number of Monte Carlo simulations performed per decision step.
- -
T is the number of tips in the Tangle.
This complexity reflects the algorithm’s operation: for each decision,
M simulations are performed, each requiring
H steps of forward planning and logarithmic-time operations on the transaction set.
Table 2 provides a comparison of the computational complexity across all evaluated algorithms:
5.3. Security Considerations
While the primary objective of our proposed POMDP-based TSA is to enhance efficiency and fairness, security is a critical aspect of transaction selection in the IOTA Tangle that must be addressed. Traditional TSA approaches are susceptible to various security threats, including Sybil attacks, tip selection manipulation, and double-spending risks. In this section, we analyze the potential security challenges and discuss how our method mitigates these risks.
One major vulnerability in existing TSA methods is Sybil attacks, where malicious nodes generate multiple identities to increase the probability of their transactions being selected, thus gaining an unfair advantage. Traditional TSAs, such as Random TSA and Unweighted TSA, lack mechanisms to distinguish between legitimate and malicious transactions, making them highly susceptible to such attacks. By contrast, our POMDP-based TSA integrates probabilistic selection mechanisms that dynamically adapt tip selection probabilities, reducing the influence of artificially created Sybil nodes and ensuring a fairer transaction selection process.
Another significant security concern is tip selection manipulation, where an attacker strategically delays certain transactions or prioritizes their own transactions to alter the Tangle’s structure. Attackers leveraging MCMC-based TSAs can exploit weight-based selection probabilities to direct tip selection in their favor. Our method counters this by incorporating exploration–exploitation balancing through upper confidence bounds for trees (UCT), ensuring that tip selection is less predictable and more resistant to adversarial control.
Additionally, double-spending attacks remain a major challenge in decentralized ledger systems. Since IOTA’s structure inherently lacks miners, transactions rely on network-wide consensus for validation. Weighted TSAs and BRW-based selection may fail to prevent adversarial nodes from approving conflicting transactions. Our POMDP-based TSA enhances security by leveraging belief-state modeling, where each transaction’s probability of selection is influenced by previously confirmed transactions, reducing the likelihood of approving double-spending attempts.
Furthermore, we evaluate the resilience of our approach to transaction flooding attacks, where an attacker generates a high volume of low-value transactions to congest the network. Traditional TSA models struggle with such attacks due to the lack of priority-based selection. By optimizing transaction selection using reward-based policies, our POMDP-based TSA prioritizes transactions that contribute to a stable Tangle, effectively mitigating network congestion caused by adversarial spam.
These security enhancements, combined with our method’s inherent efficiency improvements, demonstrate that our POMDP-based TSA not only improves fairness and scalability but also significantly strengthens IOTA’s resistance to adversarial threats, making it a more robust solution for decentralized transaction management.
In summary, our proposed approach mitigates common security threats encountered in existing TSA models by dynamically adjusting tip selection, making attacks more difficult to execute, and enhancing overall network resilience. These improvements make our algorithm a viable solution for securing decentralized ledger transactions while maintaining efficiency and fairness.
5.4. Comparative Analysis
To evaluate the effectiveness of our POMDP-based TSA compared to existing approaches, we conducted a comprehensive performance analysis across multiple metrics.
Table 3 provides a summary of this comparative evaluation, demonstrating the superior performance of our approach in key areas, including confirmation rates, orphan transaction reduction, and resistance to network congestion.
As shown in
Table 3, our POMDP-based TSA outperforms existing approaches across all key metrics. Most notably, it achieves a confirmation rate of 89–94% compared to 45–60% for Random TSA and only 32–40% for Unweighted TSA. Furthermore, our approach maintains an orphan transaction rate of just 1–5%, significantly lower than all other algorithms tested. Most importantly, our algorithm eliminates lazy tips, which remain a persistent issue in other approaches, with rates as high as 38% in Unweighted TSA.
This difference can be explained by the fundamental decision-making mechanisms underlying each approach. Random TSA fails due to its arbitrary selection strategy, leading to frequent orphan transactions (20–25%) and inefficiencies in tip selection. Unweighted TSA neglects transaction importance, causing delays and the highest rate of lazy tips (32–38%). Weighted TSA, while prioritizing high-weight transactions, suffers from severe tip starvation for newer transactions, resulting in high orphan rates (28–30%).
Hybrid TSA-1 and TSA-2 attempt to optimize selection using dynamic α adjustments, but their linear combination approach lacks the probabilistic reasoning needed to handle uncertainty in the Tangle, explaining their moderate performance (38–45% confirmation rates). Similarly, E-IOTA and G-IOTA introduce additional processing complexity without addressing the fundamental uncertainty in decentralized networks.
By contrast, our POMDP-based TSA achieves superior performance (89–94% confirmation rates, 1–5% orphan rates) by maintaining a probability distribution over possible Tangle states and using Monte Carlo simulations to evaluate long-term outcomes. By incorporating upper confidence bounds for trees (UCT) for exploration and exploitation, POMDP ensures balanced exploration and exploitation, preventing the confirmation bias that leads to lazy tips in other algorithms.
Mathematically, POMDP selects a tip based on a softmax-based exponential strategy, which ensures that even low-weight transactions have a nonzero probability of selection:
where
Q(a) represents the estimated reward of selecting tip
a, and
T controls the exploration level. This formulation ensures a proper probability distribution over tips, giving even low-weight transactions a non-zero selection probability proportional to their expected contribution to the Tangle. This directly addresses the tip starvation problem present in deterministic approaches, explaining why our method completely eliminates lazy tip selection while maintaining high confirmation rates across diverse network conditions.
5.5. Application in IoT Networks and Implementation Challenges
5.5.1. Application in IoT Networks
The increasing adoption of IoT-based decentralized networks requires efficient and fair transaction selection mechanisms. POMDP-TSA provides a structured decision-making process that optimizes tip selection in IOTA Tangle networks, ensuring secure and reliable transaction processing. The key applications include:
Smart Cities: Enhancing secure and fair micro-transactions in smart infrastructure, such as energy trading and connected transport systems.
Industrial IoT (IIoT): Supporting decentralized transaction management in manufacturing and supply chain monitoring.
Blockchain-Based IoT Systems: Improving transaction fairness and security in peer-to-peer IoT networks, where scalability and transaction integrity are essential.
Unlike traditional tip selection algorithms, POMDP TSA minimizes orphan transactions and lazy tip selection, ensuring a more equitable distribution of transaction validation across the IoT ecosystem.
5.5.2. Implementation Challenges
While POMDP TSA is computationally feasible for IoT environments, several challenges must be considered when implementing it in real-world scenarios:
Computational Overhead: Although POMDP TSA is more efficient than complex heuristic approaches, its moderate increase in execution time compared to lightweight TSAs (e.g., Random, G-IOTA) may pose constraints on low-power IoT devices with minimal processing resources.
Energy Efficiency: Battery-operated IoT devices need energy-efficient algorithms. While POMDP TSA provides improved fairness, future research could explore ways to reduce its energy footprint through adaptive computation strategies.
Integration with Existing IoT Architectures: Implementing POMDP-based decision making within current IoT infrastructures may require modifications to protocols and consensus mechanisms, ensuring compatibility with existing blockchain frameworks.
6. Conclusions and Future Work
6.1. Conclusions
This research has successfully introduced and validated a novel POMDP-based tip selection algorithm that significantly advances the state-of-the-art in IOTA Tangle network performance. Through comprehensive simulation experiments and comparative analysis, our algorithm demonstrates exceptional capabilities in managing transaction states and optimizing network efficiency. The empirical results conclusively show that our POMDP-based TSA consistently outperforms existing algorithms across all key performance metrics.
Our approach achieved confirmation rates of 89–94%, dramatically higher than the 50–60% rates of the next best algorithms (E-IOTA and G-IOTA), while traditional approaches like Unweighted TSA struggled to exceed 40%. Particularly noteworthy is our algorithm’s ability to reduce orphan transactions to just 1–5%, compared to 15–35% for other algorithms, representing a major advancement in addressing the fairness problem that has plagued the IOTA Tangle. Furthermore, our approach completely eliminated lazy tip selections, a persistent issue in other algorithms that ranges from 12–38% of transactions.
The POMDP-based TSA maintained stable performance across varying network conditions, from low to high latency and across different transaction volumes, agent counts, and arrival rates. This stability in diverse environments validates its practical applicability for real-world deployment. The algorithm’s probabilistic decision-making framework, guided by a carefully designed reward function, proved highly effective at balancing exploration and exploitation in the Tangle, leading to more equitable transaction approvals and improved network throughput.
Our security analysis further demonstrated that the POMDP approach offers enhanced resilience against common attack vectors, including Sybil attacks, tip selection manipulation, and transaction flooding, making it not only more efficient but also more secure than existing approaches. Despite its sophisticated decision-making process, our algorithm maintains reasonable computational requirements, with simulation times comparable to other advanced TSAs, ensuring its feasibility for IoT environments.
These findings establish our POMDP-based TSA as a significant advancement in optimizing the IOTA Tangle’s transaction selection process, effectively addressing the critical challenges of fairness, efficiency, and security in this important distributed ledger technology. Future research can build upon this foundation to further enhance the practical implementation of our approach in diverse application domains.
6.2. Future Work
This research establishes a foundation for improving scalability, fairness, and efficiency in IOTA Tangle networks. Future work will focus on adaptive optimization, scalability, security, real-world deployment, and hybrid TSA approaches.
Adaptive Optimization and Scalability: Future efforts will develop self-adjusting POMDP parameters using machine learning and explore distributed implementations to enhance efficiency in large-scale IoT and blockchain networks.
Security and Deployment: Future research will include a security assessment against adversarial attacks, development of resistance mechanisms, and real-world implementation in IOTA networks, ensuring efficiency in resource-constrained environments.
Hybrid TSA and Adaptability: Future studies will investigate hybrid TSAs by integrating POMDP with other consensus mechanisms and enabling context-aware algorithm switching for adaptive performance.
These advancements will further enhance IOTA’s scalability, fairness, and security, strengthening its role in next-generation DAG-based decentralized systems.