Modeling the Decision and Coordination Mechanism of Power Battery Closed-Loop Supply Chain Using Markov Decision Processes
Abstract
:1. Introduction
2. Literature Review
2.1. Research Status of Power Battery Closed-Loop Supply Chain Management
2.2. Research on the Application of the Markov Decision Process in Supply Chain Management
2.3. Research Progress in Closed-Loop Supply Chain and Reverse Logistics
3. Markov Decision Process Model
3.1. Fundamentals of the Markov Decision Process
3.2. Modeling the Markov Decision Process of Power Battery Closed-Loop Supply Chain
3.3. Methods for Solving the Model
4. Case Studies
4.1. Case Selection and Data Collection
4.2. Analysis Using Markov Decision Process Models
4.3. Python Implementation of the Model
4.3.1. Environment and Library Configuration
4.3.2. Implementation of the Dynamic Programming Algorithm
Algorithm 1 # Python code for the value iteration algorithm. |
1:import numpy as np 2:def value_iteration(transition_probs, rewards, gamma = 0.9, threshold = 0.01). 3: """ 4: value iterative algorithm implementation. 5: :param transition_probs: Matrix of state transition probabilities. 6: :param rewards: The rewards function. 7: :param gamma: Discount factor. 8: :param threshold: Convergence threshold. 9: :return: Optimal strategy and state value. 10: """ 11: num_states = len(transition_probs) 12: V = np.zeros(num_states) 13: policy = np.zeros(num_states) 14: 15: while True: 16: delta = 0 17: for s in range(num_states): 18: v = V[s] 19: V[s] = max([sum([p * (rewards[s][a][s_prime] + gamma * V[s_prime])) 20: for s_prime, p in enumerate(transition_probs[s][a])]) 21: for a in range(len(rewards[s]))]) 22: delta = max(delta, abs(v − V[s])) 23: if delta < threshold. 24: break 25: for s in range(num_states): 26: policy[s] = np.argmax([sum([p * (rewards[s][a][s_prime] + gamma * V[s_prime])) 27: for s_prime, p in enumerate(transition_probs[s][a])]) for a in range(len(rewards[s]))]) 28: return policy, V 29:# Example use (assuming state transfer probabilities and rewards are defined) 30:# policy, V = value_iteration(transition_probs, rewards) |
Algorithm 2 # Python code for the strategy iteration algorithm. |
1:def policy_evaluation(policy, transition_probs, rewards, gamma = 0.9, threshold = 0.01). 2: """ 3: Strategy evaluation function. 4: """ 5: num_states = len(transition_probs) 6: V = np.zeros(num_states) 7: while True: 8: delta = 0 9: for s in range(num_states): 10: v = V[s] 11: a = policy[s] 12: V[s] = sum([p * (rewards[s][a][s_prime] + gamma * V[s_prime])) 13: for s_prime, p in enumerate(transition_probs[s][a])]) 14: delta = max(delta, abs(v − V[s])) 15: if delta < threshold. 16: break 17: return V 18:def policy_iteration(transition_probs, rewards, gamma = 0.9). 19: """ 20: Strategy Iteration Algorithm Implementation. 21: """ 22: num_states = len(transition_probs) 23: policy = np.random.choice(len(rewards[0]), size = num_states) 24: while True: 25: V = policy_evaluation(policy, transition_probs, rewards, gamma) 26: policy_stable = True 27: for s in range(num_states): 28: old_action = policy[s] 29: policy[s] = np.argmax([sum([p * (rewards[s][a][s_prime] + gamma * V[s_prime])) 30: for s_prime, p in enumerate(transition_probs[s][a])]) 31: for a in range(len(rewards[s]))]) 32: if old_action ! = policy[s]. 33: policy_stable = False 34: if policy_stable. 35: break 36: return policy, V 37:# Example use (assuming state transfer probabilities and rewards are defined) 38:# policy, V = policy_iteration(transition_probs, rewards) |
4.3.3. Implementation of Reinforcement Learning Algorithm
Algorithm 3 # Examples of OpenAI-Gym-based reinforcement learning environments. |
1:import gym 2:from stable_baselines 3 import PPO, DQN3:# Create the environment 4:env = gym.make(‘YourMDPEnv-v0’) # Assuming ‘YourMDPEnv-v0’ is a customized 5:environment 6:# Use of PPO algorithms 7:model = PPO(‘MlpPolicy’, env, verbose = 1) 8:model.learn(total_timesteps = 10,000) 9:# Using the DQN algorithm 10:model = DQN(‘MlpPolicy’, env, verbose = 1) 11:model.learn(total_timesteps=10,000) 12:# Test models 13:obs = env.reset() 14:for _ in range(1000). 15: action, _states = model.predict(obs, deterministic = True) 16: obs, rewards, dones, info = env.step(action) 17: env.render() |
4.3.4. Analysis and Validation of Results
- 1.
- Calculation of key performance indicators
Algorithm 4 Python code for calculating key performance indicators. |
1:def calculate_performance_metrics(states, rewards, policy, V). 2: """ 3: Calculate and output performance metrics. 4: :param states: Collection of states. 5: :param rewards: The rewards function. 6: :param policy: Policy. 7: :param V: Status value. 8: """ 9: total_profit = sum([rewards[s][policy[s]] for s in states]) 10: average_profit = total_profit / len(states) 11: efficiency = sum([V[s] for s in states]) / len(states) 12: print(“Total profit:”, total_profit) 13: print(“average_profit:”, average_profit) 14: print(“Efficiency:”, efficiency) 15:# Example use (assumes existing states, rewards, policy, V) 16:# calculate_performance_metrics(states, rewards, policy, V) |
- 2.
- Mapping the learning curve
Algorithm 5 Python code for drawing learning curves |
1:import matplotlib.pyplot as plt 2:def plot_learning_curve(rewards, title=“Learning Curve”):. 3: """ 4: Mapping the Learning Curve. 5: :param rewards: Rewards for each step or turn. 6: :param title: Chart title. 7: """ 8: plt.figure(figsize=(10, 5)) 9: plt.plot(rewards) 10: plt.title(title) 11: plt.xlabel(‘Number of rounds’) 12: plt.ylabel(’Reward’) 13: plt.show() 14:# Example use (assuming a list of rewards already exists) 15:# plot_learning_curve(rewards) |
- 3.
- Parametric sensitivity analysis
Algorithm 6 Python code for parameter sensitivity analysis |
1:def sensitivity_analysis(param_range, env, model_class). 2: """ 3: Sensitivity analysis was performed for different parameter values. 4: :param param_range: Parameter range. 5: :param env: Enhanced learning environment. 6: :param model_class: Intensive learning model class. 7: """ 8: performance_metrics = [] 9: for param in param_range: 10: model = model_class(‘MlpPolicy’, env, gamma=param, verbose=0) 11: model.learn(total_timesteps = 10,000) 12: # Evaluate model performance... 13: performance = evaluate_model(model, env) # assumes evaluate_model is 14:defined 15: performance_metrics.append(performance) 16: # Plotting the results of parameter sensitivity analyses 17: plt.figure(figsize=(10, 5)) 18: plt.plot(param_range, performance_metrics) 19: plt.title(“Parameter sensitivity analysis”) 20: plt.xlabel(‘Parameter value’) 21: plt.ylabel(‘Performance Indicator’) 22: plt.show() 23:# Example usage (assuming env, model_class already exists )24:# sensitivity_analysis(np.linspace(0.1, 0.9, 9), env, PPO) |
5. Decision-Making and Coordination Mechanisms
5.1. Decision-Making Mechanisms Based on Markovian Decision Process Models
- The decision-making mechanism is decentralized, i.e., each decision maker can make decisions independently without the need to communicate or consult with other decision makers.
- The decision-making mechanism is adaptive, i.e., each decision maker can continuously update his/her state and strategy in response to changes and feedback from the environment in order to adapt to uncertainty and dynamics.
- The decision mechanism is intelligent, i.e., each decision maker can learn and optimize to find the optimal or near-optimal action to improve his or her long-term desired reward.
5.2. Coordination Mechanisms Based on Markov Decision Process Models
- The coordination mechanism is centralized, i.e., a central coordinator is needed to design and implement the coordination mechanism, as well as to communicate or consult with all parties in the supply chain.
- The coordination mechanism is contractual in nature, i.e., it requires a contract or agreement to bind the supply chain parties to their behaviors and responsibilities, as well as to specify the benefits and risks for each party in the supply chain.
- The coordination mechanism is incentive-based, i.e., it needs to provide incentives or penalties to motivate supply chain parties to comply with the contract or agreement, as well as to promote the overall efficiency and utility of the supply chain.
- The specific steps for the implementation of this coordination mechanism are set out below:
5.3. Assessment of the Effectiveness of Coordination Mechanisms and Recommendations for Improvement
- Rate of increase in the overall efficiency of the supply chain: The percentage increase in the overall efficiency of the supply chain after the use of coordination mechanisms compared to before the use of decision-making mechanisms.
- Rate of increase in the overall utility of the supply chain: The percentage increase in the overall utility of the supply chain after the use of the coordination mechanism compared to before the use of the decision-making mechanism.
- Equity in profit distribution among supply chain parties: This refers to whether the distribution of profits among supply chain parties is in line with their contributions and expectations after the use of the coordination mechanism and whether there is any imbalance or exploitation in profit distribution.
- Contract compliance rate of supply chain parties: This refers to whether, after using the coordination mechanism, supply chain parties make decisions in accordance with the contract or agreement and whether there is any violation of the contract or agreement.
- In this paper, we design different coordination mechanisms based on different objective functions, weights, contracts, or agreements, and compare their effects under different parameters and scenarios.
- When determining the objective function, the multiple objectives of the supply chain parties, such as profit, cost, service, and environment, should be considered and weighed and balanced according to the actual situation and priorities.
- In determining the weights, the benefit preferences and risk preferences of all parties in the supply chain should be taken into account and allocated and adjusted according to the actual situation and the principle of fairness.
- The incomplete and asymmetric information of the supply chain parties should be taken into account when designing contracts or agreements, and they should be designed and optimized according to the actual situation and incentive principles.
- The truthfulness and good faith of the parties in the supply chain should be taken into account in the execution of the contract or agreement, which should be monitored and enforced in accordance with the actual situation and the principle of constraint.
- Limitations of the study and future prospects
6. Conclusions and Outlook
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Jia, X.; Li, S. Progress of domestic research on closed-loop supply chain. Logist. Sci. Technol. 2022, 45, 123–127. [Google Scholar]
- Cai, X.; Lin, Y. Comparative analysis of market behavior of closed-loop supply chain decision makers under different power structures. Logist. Sci. Technol. 2023, 46, 118–123. [Google Scholar]
- Yang, S. Research on the strong chain path of power battery industry based on the goal of “double carbon”. Automob. Accessories 2023, 63–67. [Google Scholar] [CrossRef]
- Zhou, Y.; Wang, P.; Liu, E. An introduction to dynamic classification management of suppliers under supply chain management environment. Economist, 2023; 49–51+54. [Google Scholar] [CrossRef]
- Huang, S.-B.; Chen, B.; Gao, S.-Y. Energy management strategy for electric vehicle charging station based on Markov decision process. Power Autom. Equip. 2022, 42, 92–99. [Google Scholar]
- Aldrighetti, R.; Battini, D.; Das, A.; Simonetto, M. The performance impact of Industry 4.0 technologies on closed-loop supply chains: Insights from an Italy based survey. Int. J. Prod. Res. 2023, 61, 3004–3029. [Google Scholar] [CrossRef]
- Liu, Z.; Li, P.; Wang, C. Reliability analysis of supply chain based on propagation dynamics model. J. Mil. Transp. 2023, 2, 42–49. [Google Scholar]
- Zhang, X.; Han, L.; Qin, Y.; Li, F. Markov chain-based supply chain trust evolution game and stabilization strategy. Stat. Decis. Mak. 2019, 35, 47–51. [Google Scholar]
- Feng, Z.; Yang, D.; Wang, X. “Internet+ Recycling” Platform Participation Selection Strategy in a Two-Echelon Remanufacturing Closed-Loop Supply Chain. Int. J. Environ. Res. Public Health 2023, 20, 3999. [Google Scholar] [CrossRef] [PubMed]
- Pedram, A.; Sorooshian, S.; Mulubrhan, F.; Abbaspour, A. Incorporating Vehicle-Routing Problems into a Closed-Loop Supply Chain Network Using a Mixed-Integer Linear-Programming Model. Sustainability 2023, 15, 2967. [Google Scholar] [CrossRef]
- Bathaee, M.; Nozari, H.; Szmelter-Jarosz, A. Designing a New Location-Allocation and Routing Model with Simultaneous Pick-Up and Delivery in a Closed-Loop Supply Chain Network under Uncertainty. Logistics 2023, 7, 3. [Google Scholar] [CrossRef]
- Guo, H.; Liu, G.; Zhang, Y.; Zhang, C.; Xiong, C.; Li, W. A hybrid differential evolution algorithm for a location-inventory problem in a closed-loop supply chain with product recovery. Complex Intell. Syst. 2023, 9, 4123–4145. [Google Scholar] [CrossRef]
- Kolyaei, M.; Azar, A.; Ghatari, R.A. An integrated robust optimisation approach to closed-loop supply chain network design under uncertainty: The case of the auto glass industry. Int. J. Process Manag. Benchmarking 2023, 14, 285–310. [Google Scholar] [CrossRef]
- Gu, X.; Huang, H.; Guo, J. Power battery recycling strategy for electric vehicles considering government subsidies. Inf. Manag. Res. 2022, 7, 1–14. [Google Scholar]
- Wei, G.; Li, Z.; Lu, T.; Xi, W.; Yuan, Y. Research on battery life prediction method based on Markov model. Autom. Instrum. 2019, 44–47. [Google Scholar] [CrossRef]
- Yang, Z. Power Allocation Smoothing Strategy for Hybrid Vehicles Based on Markov Algorithm. Master’s Thesis, Tianjin University, Tianjin, China, 2018. [Google Scholar]
Parameters | Hidden Meaning | Value | Unit (of Measure) |
---|---|---|---|
Nmax | Inventory caps for new power cells for manufacturers and distributors | 100 | |
Rmax | Inventory caps for manufacturers and distributors of reused power cells | 100 | |
Wmax | Inventory cap on used power batteries for distributors, recyclers, and reusers | 100 | |
Cmax | Maximum remaining capacity of power batteries for consumers | 100 | kWh |
Dmax | Consumer demand for new power cells capped | 10 | |
Pmax | Manufacturer’s production quantity cap for new power cells | 10 | |
Bmax | Caps on the number of reused power cells purchased by manufacturers and consumers | 10 | |
Omax | Maximum number of new power cell orders from distributors | 10 | |
Umax | Maximum number of reused power cells sold by distributors and reusers | 10 | |
Smax | Maximum quantity of used power batteries to be sold by distributors and recyclers | 10 | |
Tmax | Maximum quantity of used power batteries to be purchased by distributors and recyclers | 10 | |
Bmax | Consumers are capped on the number of new power cells they can purchase | 10 | |
Fmax | Consumers are capped on the number of used power batteries delivered | 10 | |
cp | Manufacturer’s unit cost of producing new power cells | 1000 | CNY |
cb | Manufacturer’s unit price for purchasing reused power cells | 5000 | CNY |
cn | Manufacturer’s unit inventory cost of holding new power cells | 50 | CNY |
cr | Unit inventory costs for manufacturers to hold reused power cells | 50 | CNY |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, H.; Li, N.; Lin, J. Modeling the Decision and Coordination Mechanism of Power Battery Closed-Loop Supply Chain Using Markov Decision Processes. Sustainability 2024, 16, 4329. https://doi.org/10.3390/su16114329
Zhang H, Li N, Lin J. Modeling the Decision and Coordination Mechanism of Power Battery Closed-Loop Supply Chain Using Markov Decision Processes. Sustainability. 2024; 16(11):4329. https://doi.org/10.3390/su16114329
Chicago/Turabian StyleZhang, Huanyong, Ningshu Li, and Jinghan Lin. 2024. "Modeling the Decision and Coordination Mechanism of Power Battery Closed-Loop Supply Chain Using Markov Decision Processes" Sustainability 16, no. 11: 4329. https://doi.org/10.3390/su16114329
APA StyleZhang, H., Li, N., & Lin, J. (2024). Modeling the Decision and Coordination Mechanism of Power Battery Closed-Loop Supply Chain Using Markov Decision Processes. Sustainability, 16(11), 4329. https://doi.org/10.3390/su16114329