*Editorial* **Special Issue on "Modelling, Monitoring, Control and Optimization for Complex Industrial Processes"**

**Zhiwei Gao**

Faculty of Engineering and Environment, University of Northumbria, Newcastle NE1 8ST, UK; zhiwei.gao@northumbria.ac.uk

Industrial automation systems, such as chemical processes, manufacturing processes, power networks, transportation systems, sustainable energy systems, wireless sensor networks, robotic systems, and biomedical systems, are becoming more complex [1–3], but more expensive, and have higher requirements for operation performance, quality of products, productiveness, and reliability. Stimulated by Industry 4.0, automation industries are keen to improve the reliability and operational performance of complex industrial processes using advanced modelling, monitoring, optimization, and control techniques. Recently, artificial intelligence, data-driven techniques, cyber–physical systems, digital-twin, and cloud computation have further stimulated research and applications of modelling, monitoring, optimization, and control techniques [4–6].

This Special Issue on "Modelling, Monitoring, Control and Optimization for Complex Industrial Processes" (https://www.mdpi.com/journal/processes/special\_issues/ Complex\_Industrial\_Processes) aims to provide a forum for researchers and engineers to report their recent results, exchange research ideas, and look over emerging research and application directions in modelling, monitoring, optimization, and advanced control for complex industrial processes. There are 22 papers included in this Special Issue, after a rigorous review process, which are categorised and presented in Table 1.

**Table 1.** Categories of the paper included in the Special Issue.


#### **1. Modelling and Parameter Identification for Complex Industrial Processes**

It is significant but challenging to identify system parameters of a mechanism system under on-line working conditions as uncertainties exist due to the differences between design requirements and real-time environment. In the paper co-authored by Zhang et al. [7], a reinforcement learning approach was applied to forging machines to attain real-time model parameters, where raw data were used directly, and an online parameter identification algorithm was implemented in a period without the aid of labelled samples as a training database. The addressed parameter identification technique proved to have a powerful capability to adapt a new process without historical data. The effectiveness was validated via a forging machine process.

It is difficult to model porous structures due to their irregular internal morphologies. Conventional CAD modelling approaches fail to represent internal structures and conformations in models, although they can effectively describe the external geometric and topological information of the models. In the work completed by Ren [8], an effective modelling method for 3D irregular porous structures was presented based on a finite element method and thermodynamic analysis, and the key idea was to solve isothermal issues in the modelling of the porosity of porous units. It was shown from experiments

**Citation:** Gao, Z. Special Issue on "Modelling, Monitoring, Control and Optimization for Complex Industrial Processes". *Processes* **2023**, *11*, 207. https://doi.org/10.3390/pr11010207

Received: 3 January 2023 Accepted: 4 January 2023 Published: 9 January 2023

**Copyright:** © 2023 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

that the proposed technique can achieve smooth and approximate porous structures from arbitrary irregular 3D surfaces.

The discrete element method has a capability to analyse interactions among particles themselves, and interconnections between particles and mechanical components, to reveal the influencing factors and operating mechanism of the mechanical components. In the work by Liu et al. [9], a discrete element method based modelling technique was used to represent pill particle population which was employed to optimise the anti-corrosion process of oil and gas wellbore casing annuli. A simulation model was built, and the theoretical foundation was established for the further investigation of the pill discharging process and the parameter optimisation of the pill discharging device.

Green growth is defined as a process for a manufacturing enterprise to grow stronger with green strategies and green behaviours to achieve less consumption of resources and energy, less pollution, and more environmentally friendly and healthy products. In the article by Li et al. [10], a conceptual model of the factors influencing the green growth of manufacturing enterprises was established and a method was addressed to further reveal the relevant dynamic mechanisms and essential influencing factors, determined using a decision-making trial and evaluation laboratory strategy. Six key influencing factors were finally verified using a wooden flooring manufacturing company as a case study.

#### **2. Monitoring, Diagnosis, and Resilience for Complex Industrial Processes**

Fault diagnosis approaches are categorized into a model-based diagnosis approach, signal-based diagnosis method, and knowledge-based diagnosis approach. A model-based approach is widely used if a model is available to the designer. Continuous stirring tank reactors are widely used in chemical production processes, where there is a nonlinear dynamic process disrupted by time delays and uncertainties. In the article contributed by Wang et al. [11], continuous stirring tank reactors were represented by a T-S fuzzy model with state delays and disturbances. A fuzzy fault detection filter was addressed to detect faults and the design gains were obtained to solve the convex optimization of linear matrix inequalities. The effectiveness of the proposed diagnosis method was demonstrated by simulation studies.

Knowledge-based approaches are based on data driven and machine-learning techniques. Therefore, quantitative knowledge-based approaches are also called data-driven approaches. In the paper co-authored by Zhang et al. [12], a novel fault–diagnosis–classification optimization method was proposed by fusing a sine cosine algorithm, support vector machine, and transfer learning. Intensive simulation studies were carried out, and the proposed algorithm outperformed five existing diagnosis algorithms with higher precision and a faster response time. In addition, the proposed algorithm can run effectively using transfer learning with less failure data.

Multi-agent systems have received attention where multiple agents communicate through a premeditated protocol to operate collectively. A fault in an agent may deteriorate the performance of its neighbouring agents and even the entire network. As a result, it is important to detect an agent fault as early as possible. In the paper co-authored by Lu et al. [13], a fault diagnosis problem was investigated for multi-agent systems, where a neural-network-based state prediction model was built via offline historical data training, and the residuals between the actual outputs and predicted outputs were checked to detect potential faults. The effectiveness of the presented diagnosis algorithm was finally demonstrated via a real experiment on a leader–follower inverted pendulum.

Networked dynamic systems would suffer potential security threats caused by malicious attacks, which would destabilize networked dynamic systems and disturb communications between networked systems. As a result, there is a motivation to discuss the resilience issue in networked systems subjected to cyber-attacks. In the article contributed by Tan et al. [14], a resilient control issue for networked nonlinear dynamic systems with dynamic trigger mechanisms and malicious aperiodic denial-of-service attacks was examined. A resilient dynamically triggering controller was designed to alleviate the effects of

cyber-attacks and reduce the usage of communication resources. The proposed approaches were validated by using the well-known nonlinear Chua circuit.

#### **3. Control Applications for Complex Industrial Systems**

Chaos is a complex nonlinear phenomenon in nature and chaotic systems have been widely applied to a variety of practical systems, such as secure communications, industrial processes, and ecosystems, etc. Some chaotic behaviours are harmful, which should be supressed. In the article contributed by Liang et al. [15], a tracking controller was designed for hyperchaotic complex systems, and the feasibility of the proposed design was verified from two perspectives, via both mathematical proofs and simulation experiments.

H8 transformer-less inverter can be used to eliminate an earth leakage current, and model predictive control has been a popular control technique in industrial applications. In the paper contributed by Zaid et al. [16], a model predictive control method was used to improve the performance of H8 transformer-less inverters supplied by a photovoltaic energy source. The Hardware-in-the-Loop was implemented using a DSP target Launch-PadXLTMS320F28379D kit to demonstrate the effectiveness of the proposed approach.

Temperature control has been widely used in the control of dividing-wall distillation columns, which has an advantage in dynamic characteristics, but cannot track the steady values well due to its limited accuracy in estimating controlled product purities. Motived by the above, in the paper contributed by Yuan et al. [17], an improved temperature control approach was addressed with the aid of product quality estimation and a genetic algorithm. It was demonstrated by the simulated studies that the proposed control scheme can reduce steady-state deviations in the maintained product purities as well as have better dynamic characteristics, which proved to be a useful tool for the temperature inferential control for dividing-wall distillation columns.

A permanent magnet synchronous motor has a wide industrial application. It is noticed that it is usually challenging to establish an accurate mathematical model for a permanent magnet synchronous motor, and an application of a complex algorithm may pose a challenge for embedded code development. Motived by the above, in the paper co-authored by Jiang et al. [18], a characteristic model for a permanent magnet synchronous motor was built, and a speed control scheme was proposed by integrating a linear goldensection adaptive control and integral compensation. It was shown by the simulation and experimental results that the speed control accuracy using the proposed control algorithm for a permanent magnet synchronous motor was improved by 3.8 times compared with traditional proportional-integral-derivative control algorithms.

Electric vehicles are green modes of transportation, which will replace fossil-fuelled vehicles soon. However, charging stations for electric vehicle batteries may impose a high energy demand on the utility grid. As a result, it is of interest to investigate standalone charging stations for electric vehicles using photovoltaic power sources to support the utility grid. In the paper contributed by Atawi. [19], an isolated electric vehicle charging station model based on a photovoltaic energy source was built, which was composed of a photovoltaic panel, boost converter, energy storage system batteries, DC/DC charging converters, and an electric vehicle battery. The control system was composed of a maximum power tracking controller, electric vehicle charging controller, and storage converter controller, which were, in essence, PI controllers, as well as a single-chip PIC18F4550 microcontroller utilized for control implementation. It was demonstrated by the simulations and experiments that the used controllers can provide good response speeds and satisfactory tracking abilities to their references.

Steam generators are critical devices in nuclear power plants, and their control performances are paramount to maintain normal operations. It is of interest to develop optimal control in a steam generator level process. In the article by Kong et al. [20], a systemic data-driven optimization methodology was proposed, which was used to optimise control system parameters by using control performance measurements directly. The effectiveness of the addressed method was demonstrated via simulations, concluding that the addressed

simplex search method was effective in controller parameter optimization to improve control system performance in steam generator level processes.

#### **4. Optimization Applications for Complex Systems**

Wind energy plays a leading role in renewable energy industries. To reduce workloads, improve efficiency, and provide better evaluation and judgment, inspection robots have been introduced into wind farms for inspection. It is a prerequisite to produce a path planning for intelligent inspection using robots. In the article by Chen et al. [21], a new path-planning algorithm was proposed based on a chaotic neural network and genetic algorithm. The proposed algorithm was verified via a path planned for patrol robot using the actual locations of 30 wind farms, showing the addressed algorithm can generate a shorter inspection path compared with some existing algorithms.

It is of significance to boost material removal rate and waste reuse rate in a rough processing stage of a three-dimensional stone product with an unusual shape. In the paper contributed by Shao et al. [22], circular saw disc cutting was inspected to cut a convex polyhedron out of a blank box, with reference to a targeted product. It is evident that this problem can be better solved by geometrical methods rather than mathematical methods. An automatic block cutting strategy was proposed by using a series of geometrical optimization approaches. The effectiveness of the proposed method was demonstrated via simulated studies using both MATLAB and the Vericut platform.

It is of importance to have precise process planning to produce an open-die-forged part with a desired final geometry as well as economic production. In the paper contributed by Reinisch et al. [23], a multi-objective optimization-based schedule design was addressed by combining fast process models with a double deep Q-learning algorithm. The produced pass schedules lead to a desired ingot geometry with a minimal number of passes. The addressed methods were validated via a forging experiment, showing the ability of the addressed double deep Q-learning algorithm to achieve an optimal pass schedule in real open-die forging processes.

New opportunities are provided to companies to gain competitiveness with a transformation to Supply Chain 4.0 with the aid of the lean value stream mapping tool. In the work by Kihel et al. [24], a new process design was presented by integrating 4.0 technologies, taking multinational supply chains in Automotive Wiring Equipment Morocco as case study. Using the lean value stream mapping 4.0 tool, all products and information flow in a value chain from suppliers to customers were optimized so that economic, social, and environmental performance were improved.

Real-time optimization is a strategy to maximize a cost function with constraints so that operation can be kept at its optimum point even under conditions subjected to nonlinear behaviours and disturbances. In the work by Delou et al. [25], a small-scale real-time optimisation was investigated for a real industrial case, that is, the Natural Gas Processing Unit. A novel approach was addressed for improving efficiency using a sequential-modular simulator within an optimization framework. It was shown, using the addressed method, that an improvement in stability and an increase in profit were achieved.

Process optimization aims to optimize a set of parameters with constraints to achieve an optimal processing time and production. In the article by Chen et al. [26], the process optimization for an automated yogurt and flavour-filling machine was discussed under two scenarios: multi-filling points (Case I) to filling point (Case II). Mathematical models under different cases were developed by considering optimisation objectives. The models were tested with real data, and it was revealed that Case II was faster than Case I in processing a set of customer orders.

Inter-channel advertising and service cooperation are important research areas in channel convergence, which is an important issue in the online to offline supply chain. In the paper co-authored by Zhang et al. [27], the impacts of time delay and bidirectional free riding on inter-channel service and advertising cooperation strategies were discussed. A differential game model between brands and retailers was established by encompassing delay effect and bidirectional free-riding occurrence. Differential game theory was used to seek the optimal advertising and service decisions of the brand owners and retailers. It was shown that the service strategy, advertising strategy, and brand goodwill of the online to offline supply chain members were optimal under a centralized decision-making system.

Artificial-intelligence-based music generation has attracted much attention. In the work by Min et al. [28], a novel approach was proposed to develop a competitive music generation algorithm by blending a transformer deep-learning model with generative adversarial networks. It is shown that the model based on transformer and generative adversarial networks can reveal the relationship in the notes of long-sequence music samples, and the rules of music composition can be learned well. An optimized transformer and generative adversarial-networks-based model can improve the accuracy of the generated notes.

**Funding:** This research received no external funding.

**Acknowledgments:** The guest editor would like to thank all authors who contributed to this Special Issue, and all the volunteering contributions from the reviewers. Special thanks to the *Processes* journal office for their administrative support for this Special Issue.

**Conflicts of Interest:** The author declare no conflict of interest.

#### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

**Dapeng Zhang 1 , Lifeng Du <sup>2</sup> and Zhiwei Gao 3, \***


**Abstract:** It is a challenge to identify the parameters of a mechanism model under real-time operating conditions disrupted by uncertain disturbances due to the deviation between the design requirement and the operational environment. In this paper, a novel approach based on reinforcement learning is proposed for forging machines to achieve the optimal model parameters by applying the raw data directly instead of observation window. This approach is an online parameter identification algorithm in one period without the need of the labelled samples as training database. It has an excellent ability against unknown distributed disturbances in a dynamic process, especially capable of adapting to a new process without historical data. The effectiveness of the algorithm is demonstrated and validated by a simulation of acquiring the parameter values of a forging machine.

**Keywords:** parameter acquisition; mechanism model; reinforcement learning; forging machine

**Citation:** Zhang, D.; Du, L.; Gao, Z. Real-Time Parameter Identification for Forging Machine Using Reinforcement Learning. *Processes* **2021**, *9*, 1848. https://doi.org/ 10.3390/pr9101848

Academic Editors: Rodolfo Haber and Jae-Yoon Jung

Received: 6 August 2021 Accepted: 14 October 2021 Published: 18 October 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

#### **1. Introduction**

Complex engineering systems are with a high requirement for system reliability and control and production performance. A variety of technologies are developed to support the monitoring, optimization, and control for complex industrial processes such as chemical processes, manufacturing systems, power, and energy systems [1–3]. The forging process that enhances the mechanical properties by compressing the microstructure of parts [4] is widely applied in the fields of mining equipment, thermal hydro wind power generation equipment, nuclear power equipment, petroleum, and so on. As the key equipment, a forging machine should provide a precise pressing speed with a huge force to achieve the technological requirements of forging pieces. Therefore, the control of the forging machine is the guarantee of high forging quality. The control algorithms have made great progress from conventional PID-based algorithms [5] to advanced model-based control algorithms, including sliding mode control [6,7], back-stepping control [8], and feedback linearization [9], in order to obtain higher performance. However, the effects of these control algorithms strongly depend on the accuracy of the mechanism model. In [10,11], fuzzy-based control was proposed by using fuzzy rules instead of the mechanism model, but it cannot achieve the requirement of high precision. It is worthy to point out that the equivalent models, including regression models [12], neural networks [13], support vector machines [14], and so on [15], are alternatives of the mechanism model. These equivalent models overcome the difficulty of mechanical analysis, but at the cost of the model's extension and physical meanings. Up to now, the mechanism model is still feasible for precision control of the forging machine.

The mechanism knowledge of the forging machine has been mastered based on the related principles such as fluid mechanics, dynamics, and machinery technology. For example, the dynamic behaviors of the forging machine were analyzed according to the mechanism model [16]. A focus of the mechanism model with known structure is to determine the parameters, which is often by the way of offline identification and

online correction. Especially for a forging machine, most parameters come from the design handbook of forging machine [17] in which the values of parameters are recorded under the pre-set environment. The others are estimated based on the states of the forging machine by kinds of sensors. A number of offline identification methods such as least square method, maximum likelihood, Bayesian estimation, posteriori estimates, and minimizing maximum entropy were shown in reviews [18–20]. Reference [21] proposed to minimize the entropy of a kernel estimation, constructed from the residuals to deal with the case of not using the maximum likelihood estimation. In reference [22], a system parameter estimation method based on deconvolution of the system output process and explicit Levenberg optimization method was presented. Reference [23] presented a new derivative-free search method for finding models of acceptable data fit in a multidimensional parameter space and made use of the geometrical constructs known as Voronoi cells to derive the search in the parameter space. Reference [24] described a method for estimating the Nakagami distribution parameters by the moment method in which the distribution moments were replaced by their estimates. In order to trace the varying working parameters, the online estimated techniques were developed to improve the accuracy of model. The recursive parameter estimations were introduced to the linear model [25], the bilinear system [26], and the ARMA system [27]. In [28], an estimated noise transfer function was used to filter the input–output data of the Hammerstein system. By combining the key-term separation principle and the filtering theory, a recursive least squares algorithm and a filtering-based recursive least squares algorithm were addressed. Reference [29] proposed a parameter estimation algorithm using the simultaneous perturbation stochastic approximation (SPSA) to modify parameters with only two measurements of an evaluation function regardless of the dimension of the parameter. Reference [30] collected time-series data from an experimental paradigm involving repeated training and investigated the effect of various clustering methods on the parameter estimation. Reference [31] provided a servo press force by employing a novel dual-particle filter-based algorithm, achieving a maximum relative error in the force estimation of 3.6%.

As a foundation, a lot of effective historical data are necessary for parameter identification. Unfortunately, a forging machine is often working on batch processes whose parameters are different in each batch, and are even impossible to be known for new forging pieces. This means the parameters of the mechanism model for a forging machine will need to be determined from as few data as possible. From the perspective of data effectiveness, the classical parameter identification methods, whether offline estimation or online correction, are based on the least squares concept with the assumption of data following a normal distribution. It needs an appropriate window to observe the data because the statistical characteristics hide in the collected data. However, the difference of forging material quality and the variable pressure caused by pipe diameter change and flow rate change will lead to some disturbances that cause the data noise to be in an unknown distribution. So it is a challenge to determine the parameters of a model for a forging machine online to meet the needs of a complex environment.

Reinforcement learning (RL), motivated by psychology, statistics, neuroscience, and computer science, is about learning from interaction how to behave in order to achieve a design goal [32–34]. It will get rid of the limitation of training samples by learning directly from the raw data online. Through the learning process, an optimal action will be achieved to respond to the states. By sensing the current states, the RL does not need the assumption of prior distribution of noise. By episodes training, the action will overcome the overfitting difficulty and become robust due to eliminating the disturbance gradually. If the parameters were taken as the actions, they would be determined by reinforcement learning without thinking about the assumptions and disadvantages of the methods. In the case of a forging machine, it is a feasible approach to find the optimal values of the model parameters in a new condition under disturbances. There are some mature algorithms in the RL family, such as Q-learning [35], actor–critic [36], and deep reinforcement learning [37]. In this study, the Q-learning algorithm is proposed to determine the model parameters under the

working condition due to its simplicity. The contributions of this paper can be summarized as follows:

–


The rest of this paper is organized as follows. Section 2 gives the model of pressingdown in forging machine that shows the state variables and the parameters. Section 3 describes the RL's procedure and releases the proposed approach. In Section 4, the model parameters are elaborated by the proposed approach and comparisons are made with two classical methods. Finally, conclusions are drawn in Section 5. scribes the RL's procedure and

#### **2. The Model of the Pressing-Down in Forging Machine**

A semisolid metallic confectioning constant-speed isothermal forging is an important forging technique especially for light-weight alloy confectioning in the aerospace industry. The typical structure of the forging machine is illustrated in Figure 1, and the model has been built in our previous work [38]. It is repeated here for integrity.

**Figure 1.** Typical structure of forging machine [38].

The function of the forging machine in pressing-down phase is affected by the oil pipe-line, the proportional servo valve, and the hydraulic cylinder with abandoning the auxiliary attachments.

#### *2.1. The Oil Pipe-Line*

The pressing speed in the pressing-down phase is always slow to meet the craft needs, so the oil works in the state of filament flow. Taking a pipe oil column as an object, the pressure balance equation is in the form of Formula (1).

$$
\rho \mathcal{S}\_1 l \frac{d(q\_1/\mathcal{S}\_1)}{dt} = (p\_1 - p\_s)\mathcal{S}\_1 - \frac{128\mu l}{\pi d^2} q\_1 \mathcal{S}\_1 \tag{1}
$$

Let R = 32*µ ρ* , so Formula (1) becomes

$$\frac{1}{S\_1}\frac{dq\_1}{dt} = \frac{p\_1 - p\_s}{\rho l} + \frac{R}{S\_1}q\_1\tag{2}$$

The difference between input volume and output volume is equal to the sum volume of oil compress and pipe swelling. So the oil continuity equation is

$$q\_2 - q\_1 = \frac{S\_1 l}{K} \frac{d(p\_1 - p\_s)}{dt} \tag{3}$$

where *q*<sup>1</sup> and *q*<sup>2</sup> are the oil flow in pipe and the output oil flow of proportional servo valve, *p*<sup>1</sup> and *p<sup>s</sup>* are the input pressure of proportional servo valve and the pressure of a constant rate pump output, *S*<sup>1</sup> and *l* are the sectional area of pipe and the length of oil pipe, and *K* is the young's modulus of oil equal volume.

#### *2.2. Proportional Servo Valve*

The proportional servo valve performs between the servo valve and the proportional valve. It eliminates the dead band by the way of fluid forerunner. The proportional servo valve is widely applied in the ultra-low-speed hydraulic machine to control the oil flow to the hydraulic cylinder. The proportional servo valve is described as

$$\frac{1}{\omega\_n^2} \frac{d^2 q\_2}{dt^2} + \frac{2\xi}{\omega\_n} \frac{dq\_2}{dt} + q\_2 = K\_q A \tag{4}$$

where *ξ* amd *ω<sup>n</sup>* are the damping rate and the inherent frequency of propositional servo valve, respectively, *K<sup>q</sup>* = *K<sup>n</sup>* q *<sup>p</sup>*1−*p*<sup>2</sup> ∆*p<sup>n</sup>* is used to compensate the error between the practical pressure and criterion pressure, and *A* is the opening of proportional servo valve.

#### *2.3. The Hydraulic Cylinder*

The pipe-line between proportional servo valve and the hydraulic cylinder is omitted due to its short distance. The oil continuity equation of hydraulic cylinder is the form of

$$q\_2 = \mathcal{S}\_2 v + \lambda\_c p\_2 + \frac{V\_c}{K} \frac{dp\_2}{dt} \tag{5}$$

where *S*<sup>2</sup> is the plunger's sectional area of exporting cavity of hydraulic cylinder, *v* is the moving speed of plunger, *λ<sup>c</sup>* is the leak coefficient of hydraulic cylinder, *p*<sup>2</sup> is the output pressure of proportional servo valve, and *V<sup>c</sup>* is the oil volume of upper cavity of hydraulic cylinder, *V<sup>c</sup>* = *V*<sup>0</sup> + *vS*.

The dynamic equation of plunger is obtained according to the force analysis with the form of

$$p\_2\mathbf{S}\_2 + mg = m\frac{dv}{dt} + Bv + F + p\_3\mathbf{S}\_2\tag{6}$$

where *m* is the mass of slider block, *g* is the acceleration of gravity, *B* is the viscous damping coefficient, *F* is the load resistance, and *p*<sup>3</sup> is the holding pressure of slide block. According to the design of forging machine, the holding power of slide block is equal to the gravity of slide block:

$$p\_{\mathfrak{B}}\mathbf{S}\_{\mathfrak{B}} = \mathfrak{mg} \tag{7}$$

The Formula (6) is simplified to Formula (8) by substituting Formula (7) for Formula (6):

$$p\_2 S\_2 = m \frac{dv}{dt} + Bv + F \tag{8}$$

*2.4. The Model of the System as a Whole*

, <sup>2</sup> = <sup>1</sup> −

, <sup>3</sup> =

<sup>1</sup> = <sup>1</sup>

Let *x*<sup>1</sup> = *q*1, *x*<sup>2</sup> = *p*<sup>1</sup> − *p<sup>s</sup>* , *x*<sup>3</sup> = *dq*2 *dt* , *x*<sup>4</sup> = *q*2, *x*<sup>5</sup> = *p*2, and *x*<sup>6</sup> = *v*. By integrating the subsystems together, the global forging machine model can be described in the state–space form ̇ = () + () = [<sup>1</sup> , <sup>2</sup> , <sup>3</sup> , <sup>4</sup> , <sup>5</sup> , <sup>6</sup> ] =

 

, <sup>4</sup> = <sup>2</sup>

+ +

, <sup>5</sup> = <sup>2</sup>

0 0 0 0

2<sup>2</sup> =

2 

$$
\dot{\mathbf{x}} = f(\mathbf{x}) + \mathbf{g}(\mathbf{x})u \tag{9}
$$

 

1

]  –

, and <sup>6</sup> = .

where *x* = [*x*1, *x*2, *x*3, *x*4, *x*5, *x*6] *T* , *u* = *A*, 1 1 

$$
\begin{split}
\begin{bmatrix}f(\mathbf{x}) = \\ \frac{\mathcal{R}}{S\_1} & \frac{S\_1}{\rho l} & 0 & 0 & 0 & 0 \\ -\frac{K}{S\_1 l} & 0 & 0 & \frac{K}{S\_1 l} & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & -2\xi\omega\_n & -\omega\_n l^2 & 0 & 0 \\ 0 & 0 & 0 & \frac{K}{\nabla\_c} & -\frac{K\lambda\_c}{V\_c} & -\frac{K\xi\_2}{V\_c} \\ 0 & 0 & 0 & 0 & \frac{S\_2}{m} & -\frac{B}{m} \\ \end{bmatrix}
\begin{bmatrix}\begin{matrix}\boldsymbol{x}\_1 \\ \boldsymbol{x}\_2 \\ \boldsymbol{x}\_3 \\ \mathbf{x}\_4 \\ \mathbf{x}\_5 \\ \end{matrix} \\ &\mathbf{g} = \begin{bmatrix}0, 0, 0, \omega\_n^2 K\_n \sqrt{\frac{\mathbf{x}\_2 - \mathbf{x}\_3 + P\_8}{\Delta p\_n}}, 0, -\frac{F}{B} \end{bmatrix}^T
\end{split}
$$

**Remark 1.** *In the model, most parameters such as the length, the sectional area of oil pipe, the mass of slider block, and the rated flow gain can be valued according to the design. The values of parameters that are influenced by the surrounding or working conditions will result in the inaccuracy of model.*

#### **3. The Proposed Method**

*3.1. Reinforcement Learning*

The basic frame of reinforcement learning is shown in Figure 2. At each time step k, the agent makes observations *x*(*k*) ∈ *X* and takes action *u*(*k*) ∈ *U*, and receives reward *<sup>R</sup>*(*x*(*<sup>k</sup>* <sup>+</sup> <sup>1</sup>), *<sup>x</sup>*(*k*), *<sup>u</sup>*(*k*)) <sup>∈</sup> <sup>R</sup>. () ∈ () ∈ (( + 1), (), ()) ∈ ℝ

**Figure 2.** The basic frame of reinforcement learning.

– V(, ) ∈ ∈ Vπ((), ()) The expected return that is received in the long run is described using the state–action value function V(*x*, *u*), under the condition of first taking an arbitrary action *u* ∈ *U* from a certain state *x* ∈ *X* and subsequently acting according to a certain control series *π*. So the value function Vπ(*x*(*k*), *u*(*k*)) at time *k* is defined as

$$\mathcal{N}\_{\pi}(\mathbf{x}(k),\,\boldsymbol{\mu}(k)) = \sum\_{t=k}^{\infty} \gamma \mathcal{R}(\mathbf{x}(k+1),\mathbf{x}(t),\,\boldsymbol{\mu}(t)) \tag{10}$$

where *γ* ∈ [0, 1] is the discount factor.

The value function Vπ(*x*(*k* + 1), *u*(*k*)) at time *k +* 1 is defined as

$$\mathcal{N}\_{\pi}(\mathbf{x}(k+1),\ \boldsymbol{\mu}(k)) = \sum\_{t=k+1}^{\infty} \gamma \mathcal{R}(\mathbf{x}(t+1),\mathbf{x}(t),\ \boldsymbol{\mu}(t)) \tag{11}$$

According to the theory of dynamic programming

$$\mathbf{V}\_{\pi}(\mathbf{x}(k),\boldsymbol{\mu}(k)) = \mathbf{R}(\mathbf{x}(k+1),\mathbf{x}(k),\boldsymbol{\mu}(k)) + \mathbf{V}\_{\pi}(\mathbf{x}(k+1),\boldsymbol{\mu}(k)) \tag{12}$$

Unfortunately, the value function Vπ(*x*(*k*), *u*(*k*)) and Vπ(*x*(*k* + 1), *u*(*k*)) is not obtained because no one knows the rewards after time *k* + 1. To remove this obstacle, the Qfunction is designed with *Q*(*x*(*k*), *u*(*k*)) and *Q*(*x*(*k* + 1), *u*(*k*)) replacing Vπ(*x*(*k*), *u*(*k*)) and Vπ(*x*(*k* + 1), *u*(*k*)), respectively

Let

$$\boldsymbol{\delta} = \mathcal{R}(\mathbf{x}(k+1), \mathbf{x}(k), \boldsymbol{\mu}(k)) + \gamma \mathcal{Q}(\mathbf{x}(k+1), \boldsymbol{\mu}(k)) - \mathcal{Q}(\mathbf{x}(k), \boldsymbol{\mu}(k)) \tag{13}$$

The *u*(*k*) will be optimized by a process of seeking *δ* approach to zero.

As an important member of reinforcement learning family, the basic step of Qalgorithm is carried out as Procedure 1 [30].

#### **Procedure 1.**

*Initialize Q*(*x*(*k*), *u*(*k*)) *arbitrarily Repeat (for each episode) Initialize x*(*k*) *Repeat (for each step of episode) Choose u*(*k*) *from x*(*k*) *using policy derived from Q (e.g., ε* − *greedy) Take action u*(*k*)*, observe R*(*k*), *x*(*k* + 1) *Q*(*x*(*k*), *u*(*k*)) ← *Q*(*x*(*k*), *u*(*k*)) + *α* h *R*(*k*) + *γmaxu*(*k*+1)*Q*(*x*(*k* + 1), *u*(*k* + 1)) − *Q*(*x*(*k*), *u*(*k*)) i *x*(*k*) ← *x*(*k* + 1) *until x*(*k*) *is terminal.*

**Remark 2.** *There is only state information in Procedure 1. One can obtain the optimal action online by using two states, x*(*k*) and *x*(*k* + 1), *in the process of maximizing the value function. By this way, it makes an online control become possible because this approach gives up the requirement of sliding window length.*

*3.2. The Proposed Approach*

R()


R() =

()

+ 1

( + 1)

The scheme of proposed approach is shown in Figure 3.

**Figure 3.** The scheme of proposed approach.

 ∈ () −1 A model that consists of undetermined parameter *p* (*p* ∈ *R <sup>m</sup>*) is paralleled to the forging machine under the controller. The state variables of model are recorded as *x*(*k*) and *x*(*k* + 1) at sampling *k* and *k* + 1, which are connected by a delay link z −1 . The unde-

's velocity is designated a constant pressing speed or a given

()

+ 1

( + 1)||

∗

1

<sup>6</sup>

= [; ] V((), ()) V(( + 1), ( + 1))

V((), ()) = ∑ R()

V(( + 1), ( + 1)) = ∑ R()

V((), ()) V(( + 1), ( + 1))

()| − |( + 1) −

()

( + 1) + 1

∞

=

∞

=+1

( + 1) + 1, z

termined parameter *p* is regarded as the action of Q-algorithm. Therefore, the Q-algorithm following Procedure 1 is applied to determine the parameter *p* based on *x*(*k*) and *x*(*k* + 1) and finally, the optimal parameter *p* ∗ will be obtained when it is convergent.

To explicate Q-algorithm for the acquisition of model parameters, the key concepts of the proposed Q-algorithm are illustrated as follows.

#### (i). Action space, reward, and value function

The action space is made up of the undetermined parameter *p*. The values of parameter are usually inconsistent with the working condition, which will disturb with model accuracy. A goal is to determine their values responding to the surroundings.

The forging machine's velocity is designated a constant pressing speed or a given curve of speed during a certain temperature range according to the properties of forging materials, so the reward R(*k*) is selected as the reciprocal of change for absolution error between the measured speed and the set speed at adjacent sampling times *k* and *k* + 1

$$\mathcal{R}(k) = \frac{1}{||v(k) - v\_{\text{set}}(k)|| - |v(k+1) - v\_{\text{set}}(k+1)||}\tag{14}$$

where *v*(*k*) and *vset*(*k*) are the measured speed and the preset speed at sample *k*; *v*(*k* + 1) and *vset*(*k* + 1) are the measured speed and the preset speed at sample *k* + 1. Here, using *v* instead of *x*<sup>6</sup> that is the sixth component of state vector *x* is only to stress the physics meaning.

Let *s* = [*x*; *u*] so the value functions V(*s*(*k*), *p*(*k*)) and V(*s*(*k* + 1), *p*(*k* + 1)) from samples *k* and *k* + 1 are defined by Formulas (15) and (16)

$$\mathcal{V}(s(k), p(k)) = \sum\_{i=k}^{\infty} \mathcal{R}(i) \tag{15}$$

$$\text{V}(s(k+1), p(k+1)) = \sum\_{i=k+1}^{\infty} \text{R}(i) \tag{16}$$

(ii). Q-function

The value functions V(*s*(*k*), *p*(*k*)) and V(*s*(*k* + 1), *p*(*k* + 1)) are replaced by Q-function according to the Q-algorithm because the value functions are not obtained due to the unknown rewards after sample *k*. The early Q-function that is applied for the discrete space is presented as a look-up table of states row and actions column. When the states or actions are continuous, their discretization will lead to the curse of dimensionality by generating an exponentially increasing complexity of algorithm and insufficient storage. Therefore, the parameterized function is proposed to fit the Q-function with the form

$$\mathcal{Q}(s(k), p(k)) = f(s(k), p(k), \theta) \tag{17}$$

where *f* and *θ* are a parameterized mapping and the parameters, respectively. Let *s* = [*s*; *p*], an approximator is used to substitute for the unknown parameterized mapping, and there is

$$\mathbf{\dot{Q}}(s) = \sum\_{i=1}^{n} \phi\_i(s)\theta\_i \tag{18}$$

where *φi*(*s*, *a*) is usually selected as Gauss radial kernel function due to its simplicity, whose form is

$$\phi\_i(s) = e^{-\frac{\|s - s\_j\|^2}{2\sigma\_i^2}} \tag{19}$$

in which *s<sup>i</sup>* is the central coordinates of *i*-th radial kernel function and *σ<sup>i</sup>* is the width of *i*-th radial kernel function.

#### (iii). Exploitation and exploration

There are two ways to determine the action in RL. The exploitation is used to get the best action from the Q-function that is based on the reward received. The exploration is used to escape the local optimization of exploitation by randomly giving the action. As a compromise of exploitation and exploration, the *ε*− greedy algorithm is proposed to evolve the action. The agent selects the action that maximizes the Q-value function according to the probability *ε* that is usually a large probability event. In addition, it selects the action randomly according to the probability of 1 − *ε* from the action space, which makes sure the action exploration is within the unknown area. The form of *ε*− greedy algorithm is

$$p(k+1) = \begin{cases} \operatorname\*{argmax}\_{p(k)} \mathbb{Q}(s(k), p(k), \theta), \text{ } Pr < \varepsilon \\\ \operatorname\*{rand}(\mathcal{U}), \text{ } Pr \le 1 - \varepsilon \end{cases} \tag{20}$$

where *p*(*k*) and *p*(*k* + 1) are the acquisition parameter at *k* and *k* + 1, respectively, *Pr* is the probability of select action, and *U* is the action set.

(iv). The Process of Method

The proposed algorithm is summarized as Procedure 2. In this procedure, the input states are *x*(*k*), *u*(*k*) and *x*(*k* + 1), whose physical meanings are shown in Section 2, and the output parameter is *p*.

#### **Procedure 2.**

*Step 1: Give a state x*(*k*) *and the control u*(*k*) *and then construct s according to s* = [*x*; *u* ]

*Step 2: Select parameters p*(*k*) *randomly.*

*Step 3: Observe the next state x*(*k* + 1)

*Step 4: Receive immediate reward R*(*k*) *according to Formula (14)*

*Step 5: select p*(*k* + 1) *according to Formula (20)*

*Setp6: Compute Q*(*s*(*k*), *p*(*k*), *θ*) *and Q*(*s*(*k* + 1), *p*(*k* + 1), *θ*) *according to the Formulas (18) based on the model of Formula (9)*

*Step7: Compute the time series error δ*(*k*) *according to*

*δ* = *R*(*k*) + *γQ*(*s*(*k* + 1), *p*(*k* + 1), *θ*) − *Q*(*s*(*k*), *p*(*k*), *θ*)

*Step 8: Update Q*(*s*(*k*), *p*(*k*), *θ*) *according to*

*Q*(*s*(*k*), *p*(*k*), *θ*) ← *Q*(*s*(*k*), *p*(*k*), *θ*) + *αδ*

*Step9: x*(*k*) ← *x*(*k* + 1), *u*(*k*) ← *u*(*k* + 1) *and p*(*k*) ← *p*(*k* + 1)

*Step 10: Repeat steps 3 to 9 until it is convergent. The output p is the convergent p*(*k*) *in which p*(*k*) = *p*(*k* + 1) = *p*.

(v). Convergence

The convergence of Q-algorithm can be found in [35,36].

#### **4. Case Studies**

The forging machine usually keeps a good state at the early life stage. In this stage, the values of parameters after a fine machine debugging always coincide with the design condition, except for the viscous damping coefficient *B* because it is prone to be influenced by the temperature and working condition. With time elapsing, the leakage becomes the main uncertainty of the forging machine. A little leakage is permitted for the forging machine if the leakage does not affect the work process. Nevertheless, the forging machine needs to be repaired if there appears much leakage. Therefore, we chose the viscous damping coefficient *B* and leakage coefficient *λ<sup>c</sup>* as the identification parameters. These two parameters are unmeasurable, which make their values unverifiable in practice. As a result, we conducted a simulation to verify the proposed method.

#### *4.1. Data Source*

The state space model of (9) was used to simulate a forging machine. The values of model parameters are shown in Table 1 according to the design condition.


**Table 1.** The parameters values under the design condition.

A controller is necessary for a forging machine to guarantee the quality of pressing process, therefore, a PID controller was used to simulate this situation. We chose a PID controller because here we focus on verifying our proposed method rather than discussing the control method. The PID controller is enough to provide the states and control for the proposed approach. The data series were generated by solving the model (9) with ODE45 that applies the fourth-order Runge Kutta algorithm to provide the candidate solution and the fifth-order Runge Kutta algorithm to control errors. These continuous sequences provided the data source by adding two kinds of noise with uniform distribution or Gaussian distribution as a simulation of real data. The set speed was changed from 0.02 to 0.08 that is consistent with the requirement of a typical pressing process. A typical control process that includes a transition process and a stable process is shown in Figure 4.

**Figure 4.** A typical control process (the set speed = 0.05).

™ The subsequent simulation was carried out at the platform of MatlabR2011b with the computer of Intel ® Core™ 2 Duo CPU E7300 @2.66GHz 2.67GHz.

#### *4.2. Acquisition of the Viscous Damping Coefficient*

 – According to experiments, the viscous damping coefficient *B* is usually during 10– 30 for this model. As a result, the value of 15 was chosen as the predetermined value and targeted by the proposed approach according to Procedure 2. The episodes training process is shown in Figure 5, where the subgraph above is with the noises of the uniform distributions and the subgraph below is with the noises of the Gaussian distributions. It is generally believed that the training time is related to the nature of the object and the computer performance. In order to avoid the time difference caused by different computer performance, we used the number of the episodes as an index of training time.

™

**Figure 5.** The episode training process of viscous damping coefficient (B was predetermined as 15).

–

Figure 5 shows there is a trial process at the beginning of training because there is no priori information on *B*. After a trial of about 3000 episodes, the best historical value of *B* that indicates 20 for the above subgraph and 15.0626 for the below subgraph appears during the process of seeking the best reward. After about 10,000 episodes, a better value of 14.5000 occurs for the above subgraph. In contrast, a value of 15.0626 for the below subgraph is unchanged until the episodes terminate. 

The viscous damping coefficient *B* was changed from 15 to 20 to test the proposed method. The episodes training process is shown under a uniform distribution (the above subgraph) and under a Gaussian distribution (the below subgraph). Figure 6 shows the training episodes process similar to Figure 5. It is also seen that the trial process of Figure 6 lasts about 3000 episodes.

**Figure 6.** The episode training process of viscous damping coefficient (B was predetermined as 20).

δ In order to show the accuracy of parameter acquisition, the relative error *δ* between the estimated value *B* ˆ and the predetermined value *B<sup>r</sup>* is defined as a form of

− )/

= (̂

$$\boldsymbol{\delta} = \left(\boldsymbol{\hat{B}} - \boldsymbol{B}\_{\boldsymbol{r}}\right) / \boldsymbol{B}\_{\boldsymbol{r}} \tag{21}$$

–

 ̂

and the results are shown in Table 2


**Table 2.** The results of viscous damping coefficient without leakage.

It is seen from Figures 5 and 6 that the excellent results with relative errors no greater than 5% were obtained in the cases of noises with different distributions.

Further tests under the condition of oil leakage were done to verify the effectiveness of the proposed approach. For a forging machine, the leakage is prone to go into saturation and is limited to a small value, so the leakage coefficients *λ<sup>c</sup>* were assumed as a constant 0.01 and 0.02. The episodes training processes are shown in Figures 7–10. Figures 7 and 8 present the training processes of acquiring the viscous damping coefficient with a goal of 15 and of 20, respectively, under the leakage coefficient of 0.01. Figures 9 and 10 present the training processes of acquiring the viscous damping coefficient with a goal of 15 and of 20, respectively, under the leakage coefficient of 0.02. These figures show the proposed approach will be convergent after episodes training processes, and the final results are listed in Table 3. Table 3 shows the viscous damping coefficient will approach the predetermined value *B<sup>r</sup>* under different coefficients or different noise distributions, showing a maximal relative error less than 2%. For training time, there are some differences for different parameters, such as about 6000 episodes in Figure 7, about 4000 episodes in Figure 9, and about 3000 episodes in Figure 10. Sometimes the different distributions also have an effect on the training speed, which is shown in Figure 8. 

Br **Figure 7.** The episode training process with leakage of 0.01 (Br = 15).

Br

Br

Br **Figure 8.** The episode training process with leakage of 0.01 (Br = 20).

Br **Figure 9.** The episode training process with leakage of 0.02 (Br = 15).

Br **Figure 10.** The episode training process with leakage of 0.02 (Br = 20).

Br

**redeterm**

15

15

15

20

20 20


**Table 3.** The results of viscous damping coefficient under leakage.

#### *4.3. Acquisition of the Leakage Coefficient*

The leakage that is marked with leakage coefficient *λ<sup>c</sup>* in the model will become the main uncertainty along with the lapsing time of forging machine. The leakage coefficient was predetermined as a constant 0.01 and 0.02. The learning processes with uniform distribution and with Gaussian distribution are shown in Figures 11 and 12, respectively. As for training time, it is affected by different distributions in Figure 11 and about 5000 episodes in Figure 12.

 **Figure 11.** The learning process of leakage coefficient (*λ<sup>c</sup>* was predetermined as 0.01). 

 **Figure 12.** The episode training process of leakage coefficient (*λ<sup>c</sup>* was predetermined as 0.02).

= | −

= | −

̂ |

̂ |

 ̂ 

 ̂  The values of leakage coefficient *λ* ˆ *<sup>c</sup>* are acquired when the curve becomes stable. Here, the absolute error *E* with the definition of

$$E = \left| \lambda\_{\mathfrak{c}} - \widehat{\lambda}\_{\mathfrak{c}} \right| \tag{22}$$

was used to replace the former relative error because the value of leakage coefficient is too small as the denominator of Formula (22), which is prone to an inappropriate relative error. The results are listed in Table 4. Table 4 shows the absolute errors are not more than 0.0015 in the cases of noisy with different distributions.

**Table 4.** The results of leakage coefficient. **Predetermined Value Noise Distribution Acquisition Absolute Error**


*4.4. Acquisition of the Viscous Damping Coefficient and the Leakage Coefficient*

In order to test higher dimensionality of parameters, an experiment on acquiring concurrently the viscous damping coefficient and the leakage coefficient was done. The parameters of B and *λ<sup>c</sup>* were predetermined as 18 and 0.01, respectively. The learning processes with uniform distribution and with Gaussian distribution are shown in Figures 13 and 14, respectively, and the results are shown in Table 5, which shows both parameters can reach a good estimation concurrently in the cases of noisy conditions. Here, all the training times are less than 5000 episodes. 

**Figure 13.** The episode training process of viscous damping coefficient and leakage coefficient subject to noise of uniform distribution.

**Figure 14.** The episode training process of viscous damping coefficient and leakage coefficient subject to noise of Gaussian distribution.

**Table 5.** The results of the viscous damping coefficient and the leakage coefficient concurrently.


#### *4.5. Comparison with Other Methods*

A famous BP network approach and the sliding window correlation methods were chosen as a comparison of the proposed approach. The data series with 160 samples that was produced by the model with a controller was considered as the data source to determine the parameters. This data series includes a transient process of 50 and a stable process of 110 based on the viscous damping coefficient *B* of 15.

 As we know, the BP network has a strong nonlinear approximation ability and an excellent estimation of recursion problem, which needs the length of input time series to match the order of the system. Here, we focused on identifying the parameter of viscous damping coefficient *B* just in one period. After several attempts, the BP network was chosen as a 7-20-1 structure with an input of seven variables (six states and one control in the model of Section 2) and an output of the viscous damping coefficient *B*. It was trained by the back propagation algorithm based on a train set of 2000 data from different cases in which the set speed was changed from 0.02 to 0.08. The learning rate was 0.001. The welltrained BP network was used to estimate the values of viscous damping coefficient, and the results are shown in Figure 15.

The values of viscous damping coefficient from sampling 1 to sampling 160 that were estimated by the BP network and the proposed approach are shown with the black curve and the red curve. It is seen that the BP network will approach to the viscous damping coefficient in the stable process, but it is bad in the transient process. The proposed approach shows an excellent performance that achieves the 15.0625 approaching to the goal of 15.0000 throughout the whole process.

**Figure 15.** The comparison between BP network and the proposed approach.

The sliding window correlation method, as a kind of conventional parameters identification method for data series, was applied to estimate the values of viscous damping coefficient by an optimization of minimizing the sums of squared errors during each observation window. Considering the sliding window is influenced with the disturbance, it is prone to change the statistical properties of the observation window. The numbers of 2, 5, 10 and 50 were chosen as the length of sliding window, and the results are seen in Figure 16.

**Figure 16.** The comparison between the slide window and the proposed approach.

It is seen from Figure 16 that the sliding window correlation method and the proposed approach have a similar accuracy throughout the process from sampling 1 to sampling 160. However, there are some fluctuations for the sliding window correlation method according to different window length. The shorter the length of the slide window, the more sensitive the result, and vice versa. In contrast, the proposed approach shows a fine stability owing to its episodes training.

The advantages and disadvantages of three methods are summarized in Table 6.


**Table 6.** The comparisons of three methods.

The proposed approach has the ability to obtain a high accuracy of viscous damping coefficient in steady state and transient state during only a period. To our best knowledge, there are no other approaches to implement the identification of model parameters with so little information, which is beneficial to the online control. However, it is limited to a slow process of the forging machine due to a long training time, though some improvements have been made, such as eligibility traces and heuristic search. A hardware implementation of this proposed approach is an attractive request for broader industrial processes.

#### **5. Conclusions**

In this paper, reinforcement learning has been addressed to identify optimal parameters values online by directly using raw data in one period. Compared with the BP network approach, the proposed technique has a good accuracy throughout the whole process. Compared with the sliding window correlation method, the proposed method has a similar accuracy but has a better ability to resist the influence of noise. As a result, the proposed approach has been demonstrated to be effective for online parameter identification in a simulation of real-time process of a forging machine.

**Author Contributions:** Conceptualization and methodology, D.Z. and Z.G.; formal analysis, L.D.; writing—original draft preparation, D.Z.; writing—review and editing, Z.G.; All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the Program of Science and Technology Commissioner, and National Nature Science Foundation of China, grant number 61673074.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Acknowledgments:** The authors would like to acknowledge the research support from the School of Electrical Engineering and Automation at Tianjin University, and the E&E faculty at the University of Northumbria at Newcastle.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Nomenclature**

#### **Symbol Meanings**


#### **Symbol Meanings**


#### **References**

