1. Introduction
Fluid film bearings (FFBs) are the result of a complex set of interrelated hydrodynamic, thermal, tribological and mechanical phenomena. The choice of an FFB’s design parameters determines the course of these processes and, finally, the bearing’s operational properties. If efficient calculation tools are available, the selection of optimal ratios of the design parameters and the improvement of the bearing’s properties can be carried out, for example, by solving optimization problems [
1,
2,
3,
4]. Additionally, the characteristics of FFBs are often improved by the use of new design solutions [
5,
6], advanced lubricants [
7,
8], etc.
The use of active control techniques is an alternative trend in improving FFBs. There are a variety of ways to implement active FFBs. The geometric bearing’s parameters [
9,
10,
11], the lubricant’s parameters [
12,
13], as well as the parameters of their supply to the friction zone [
14,
15] have been made adjustable in various studies. The latter approach can be generally referred to as the active lubrication technique. It can also be applied to various types of bearings, including rigid [
9], tilting pads [
16], and both radial and thrust bearings [
17,
18]. An important advantage of active lubrication over the other named approaches is that in many cases, no fundamental changes to the bearing design are required; only the hydraulic (or pneumatic [
19]) lubrication system undergoes changes.
The initial purpose of introducing active lubrication to FFBs in most cases was to reduce vibration and ensure rotor stability. Such results are shown in numerous papers, e.g., by Santos et al. [
16], Nicoletti et al. [
20], Rehman et al. [
21], and Li et al. [
22]. At the same time, active lubrication inevitably affects not only the rotor motion, but also other parameters of the FFB due to the previously noted complex relationships between physical phenomena, which were also noted in [
23]. In addition to the obvious effect on the lubricant consumption, it also modifies the power consumption for pumping if external pressurization is used; the friction in the bearings, both in steady state and during startup and stop and rundown; the lubricant temperature and the corresponding deformations; the life of the rotor-bearing system’s components. Moreover, the last two points are also related to the friction occurring in the FFB. Thus, the energy parameters can also become the subject of control in active FFBs in addition to the system dynamics.
Multiple studies demonstrate various ways to reduce friction in FFBs: the use of less viscous [
24] lubricants and improving their properties [
7]; texturing bearing’s surface [
25]; adding compliant areas [
26]. However, friction and other energy parameters are still rarely considered to be controllable parameters or even possible objectives in controlled systems, including active bearings. Murashima et al. [
27] considered friction as a factor to be directly adjusted in a tribological system. Engel et al. [
28] presented a sliding bearing with adjustable friction controlled by ultrasonic oscillations. Regarding actively lubricated FFBs, the relationship between friction parameters and control techniques, namely adjustment of the shaft position, was discovered and substantiated in [
29,
30]. This provides the basis for the study and development of complex control strategies considering both the kinematic and the energy parameters of the rotor-bearing system. This approach requires the use of multi-objective control techniques that are also able to deal with nonlinear systems like FFBs.
Typical stabilization problems in actively lubricated FFBs are usually solved using conventional control methods: PID controller and its variations [
19,
31], LQG [
32], and adaptive P control [
33]. Some researchers used fuzzy logic [
34] and model-predictive controllers [
22] for this purpose. The latter, although optimal in essence, implies linearization of the system, and thus can hardly be applied to solve the control problems under consideration.
However, previous studies have shown promising results in the application of controllers based on reinforcement learning. They belong to agent-based methods and have been successfully implemented to resolve various control problems [
35,
36,
37], including in FFBs. In particular, there is a practice of applying a deep Q-network (DQN) method to synthesize controllers in this area. In [
38], a DQN controller was developed to minimize friction in an actively lubricated conical bearing. In [
39], a DQN controller was used to restrict undesirable shaft movements in a conical bearing with an adjustable bushing. In [
40], a DQN-based controller was implemented for a magnetorheological bearing to reduce the amplitudes of rotor vibrations at critical frequencies. Although the mentioned studies represent single-criteria controllers with a single control channel, the principle of reinforcement learning allows us to arbitrarily expand the number of control criteria, as well as control loops.
The aim of this work is to study the possibility of implementing multi-criteria optimal control for journal actively lubricated hybrid bearings (ALHBs) using reinforcement learning. One of the distinctive features of the study is implementation of several control loops that require coordinated operation within the framework of a controller based on reinforcement learning. Another distinctive feature is the multi-criteria formulation of the optimal control problem that takes into account not only the rotor dynamics, but also a number of energy characteristics of the bearing, such as power losses to overcome viscous friction and to pump the lubricant through the bearing. Such formulation of the problem has not previously been considered and solved for active FFBs, so the present study discovers the basics of the methodology for creating this kind of controllers and offers a discussion of the problems and prospects of the described approach.
2. Models and Methods
2.1. Basic Numerical Model
An actively lubricated hybrid bearing (ALHB) is the main object in this study. The ALHB is a journal bearing with four externally pressurized lubrication channels ending with hydrostatic pockets (grooves) located orthogonally along the bearing centerline. The lubricant supply pressure can be adjusted independently by electrohydraulic servovalves. Thus, the hydrostatic force acting on the journal can be adjusted both by the magnitude and the direction.
Figure 1 illustrates the bearing design and operation principles. Considering the mentioned purpose of this work, the rotor-bearing system’s design should ensure the representativeness of the results. Thus, a relatively heavy rotor was considered in this study because such a system’s design ensures a more severe reduction in viscous friction in an ALHB [
29]. The main parameters of the rotor-bearing system considered in this study are shown in
Table 1.
The bearing was modeled using a conventional approach assuming numerical solving of the modified Reynolds equation coupled with the flow balance equation for modeling the hydrostatic effect, and with the Lagrange equations for modeling the rotor dynamics. The Reynolds number for the bearing with the given parameters Re ≈ 250, so the lubricant flow can be considered fully laminar.
The modified Reynolds equation was used to calculate the hydrodynamic pressure distribution in the ALHB [
41]:
where
x and
z are the Cartesian coordinates of the bearing surface,
is pressure,
U and
V are the journal’s circumferential and lateral velocities, and
h is the bearing clearance function. The first term
of the right side of Equation (1) describes the pressure caused by the rotation journal surface; its optimization mainly ensures the reduction in bearing friction considered in this work [
29]. The second term
describes the pressure caused by squeezing the lubricant film and makes a smaller contribution to the friction reduction effect.
The flow balance equation was used to consider the throttling effect of the capillary restrictors and calculate the corresponding reduction in the lubricant supply pressure at their inputs and outputs [
29,
42]:
where
Q is the lubricant flow through the bearing,
QHi is the lubricant flow through a certain restrictor,
NH = 4 is the number of restrictors in the bearing,
is pressure in hydrostatic pocket, and
is a coefficient for taking into account the reduction in the flow rate due to the turbulence in a restrictor [
43,
44]:
where
is Reynolds number characterizing the lubricant flow in an injector, and the limit Reynolds number is
.
The inlet pressure for each hydrostatic pocket is adjustable in the range from 0 to due to the operation of the corresponding servovalve. The pressure is assumed to be almost the same over the entire area of a hydrostatic pocket due to the appropriate choice of its depth and the limited area compared to the full bearing’s surface area (<7%).
The bearing radial clearance function is as follows:
where
are the coordinates of the geometric center of the journal and
α is the angular bearing coordinate.
Equation (1) is solved numerically by the finite differences method [
38] together with Equations (2)–(4) resulting in the pressure distribution in the ALHB. The cavitation is taken into account based on the Gumbel’s hypothesis [
38,
45]. Numerical integration of the pressure distribution results in the following bearing forces:
The viscous friction torque in the ALHB is calculated using Equation (6) [
38]:
The corresponding power losses for overcoming the viscous friction
, as well as for the lubricant pumping through the bearing
, are as follows:
The rotor was represented by its single-mass model, assuming the absence of significant misalignments of the journal. The rotor motion equations considering the gravity force, the bearing forces, and the imbalance forces, are as follows [
46,
47]:
where
is time,
is imbalance,
is free fall acceleration,
is the rotor mass, and
are other external forces applied to the shaft.
A relatively simple proportional controller previously described in [
30,
33] was used to establish the relation between the control signals and the pressure changes in lubricant supply channels and further generate the corresponding data:
where
is the vector of basic control signals,
is the proportional gain,
is the vector of control errors,
,
is the setpoint.
where
is the vector of the control signals to electrohydraulic servovalves;
is the basic signal level, resulting in the pressure of
at a servovalve’s output (and at a restrictor’s input);
is the ratio of possible pressure rising due to control signals. Thus, a certain error value increases the control signal at a corresponding servovalve and decreases it at the opposite one, providing their differential operation and increasing the control impact on the shaft.
Finally, the output pressure at a servovalve is as follows:
where
is the servovalves’ transfer coefficient, i.e., voltage-to-pressure ratio.
2.2. Model Verification
The developed model of the ALHB was verified in several stages, in terms of calculating the pressure distribution, rotor loci, friction losses and lubricant flow rates. At each stage, the model predictions were compared to the experimental results published in other authors’ works.
Pressure distributions due to the hydrodynamic and hydrostatic effects have been compared to the experimental results published in [
48,
49], correspondingly (
Figure 2). In both cases, the reference data on the pressure are presented for the bearing centerline.
The comparison for the plain hydrodynamic bearing is presented in
Figure 2a. The modeled plain bearing has a diameter of 50 mm, a length of 20.5 mm, clearance of 73.2 μm, and is lubricated by a single groove by supplying the oil with the pressure of 0.15 and 0.3 MPa [
48].
The comparison for the hybrid bearing with six hydrostatic pockets is presented in
Figure 2b. The modeled hybrid bearing has a diameter of 62 mm, a width of 48.5 mm, and clearance of 60 μm, with 6 jets supplying the oil under the pressure of 1.32 MPa [
49]. The experimental test was carried out at an eccentricity of 0.5 at a rotor speed of 4800 rpm.
In both cases, the comparison shows good agreement between the simulation and the experimental data. The agreement for the hydrostatic effect is somewhat worse due to the unknown length of the restrictors used by authors, as well as to the errors of measurement by the pressure sensor integrated into the rotating shaft. Moreover, the mentioned discrepancies are largely leveled out by further processing of pressure data when calculating the rotor movement and other bearing parameters, as shown below.
Verification of the system’s model in terms of the rotor motion has been achieved via a comparison to the experimental results presented by Yi et al. in [
50]. The comparison was made for a hydrostatic fluid film bearing with a diameter of 60 mm, a width of 25 mm, and a radial clearance of 52 μm with 9 injectors. The lubricant used was water at a temperature of 25 °C; the supply pressure was 0.1, 0.5, and 1 MPa.
The results presented in
Figure 3 also show good agreement. The discrepancy is less than 3 μm, that is, <2% of the bearing clearance (red dashed line in
Figure 3), and the order of the typical error of proximity sensors.
Additionally, since the issues of the lubricant flow and the viscous friction in the fluid film are also significant concerns in this study, the model’s ability to accurately predict these parameters has also been checked.
The data on the hydrodynamic friction (M. Fillon in [
51]) and the lubricant flow rate (Yi et al. in [
50]) have also been obtained experimentally. Their comparison to the model’s predictions is shown in
Figure 4. In paper [
51], the authors present a study of the friction torque. The study was carried out for a bearing with the length of 80 mm, a diameter of 100 mm, radial clearance of 171 μm, and a rotor speed of 2000 rpm. The friction torque was measured under a static load of 2 kN and at different oil temperatures.
As with the other parameters considered, the predicted values are in good agreement with the measured ones with a deviation of no more than 2%. The verification results make it possible to consider the developed numerical model of ALHB adequate for the purposes of the study and to consider the results obtained to be sufficiently reliable.
2.3. The Study Pipeline and the Control Problem
Reinforcement learning methods, including the DQN method used in this article, are agent-based methods and require the use of simulation models of the control objects [
52]. The agent is trained in the optimal control strategy during multiple implementations of the training scenario with further evaluation of the results. The iterative and stochastic nature of such a process, coupled with the quadratic complexity of the grid methods used for modeling ALHBs, lead to the need for huge amounts of computation. Therefore, data-driven ANN-based models of the ALHB were used instead of numerical models to improve the computational speed. The methodology for creating such surrogate models is based on the results for plain bearings presented in [
41]. Since this work considers a hydrodynamic bearing, the dataset describing the ALHB was supplemented with the control signals values associated with the lubricant pressure changes in the bearing’s hydrostatic pockets. The process of developing a surrogate ANN model of the ALHB is described in more detail in
Section 2.4.
Next, the resulting surrogate model is used together with the rotor model to train the DQN controller. The DQN controller finds the optimal policy that meets the given goals. Goal setting is implemented by assigning a system of rewards and penalties for “right” and “wrong” actions of the DQN agent. Formulation of the reward function is one of the subjects of this study.
Important features of the DQN learning technique are described in more detail in
Section 2.6. The general pipeline of development of a DQN controller for the ALHB is schematically presented in
Figure 5.
As noted above, the goal setting for the control problem solved in this study is based on the previously found relations between the controlled shaft position in the ALHB and its energy parameters, such as power losses due to friction in the bearing. At the same time, as noted in [
29], the gain in reducing power consumption for pumping lubricant through the bearing may be even more significant in comparison with the similar effect from reducing the viscous friction. Therefore, the combination of these factors is taken into account in this study as the control objectives. The relation between the ALHB energy parameters and the shaft equilibrium position set by a controller was calculated using the verified model and is shown in
Figure 6.
Both dependencies are characterized by the presence of a single minimum and an increase in the argument in all directions from it. More eccentric shaft positions are less energy-efficient and more dangerous as they reduce the fluid film thickness. This makes it possible to synthesize a controller for the ALHB that ensures optimization of the considered energy parameters of the system and also prevents the system from transitioning to potentially dangerous states with a small fluid film thickness
. Formally, the required control policy can be represented as follows:
where
is the largest observed value of the shaft eccentricity;
is the maximum permissible value of the shaft eccentricity (the inverse of the minimum film thickness);
and
are balancing (weight) coefficients to shift the emphasis, if necessary, to one of the power parameters.
2.4. Surrogate ALHB Model
The numerical ALHB model was used to generate a dataset and build a faster surrogate bearing model consisting of 9 ANNs according to the principles described in [
41]. The whole bearing clearance was divided into three areas according to eccentricity (0–0.5, 0.5–0.7, and 0.7–0.9), and a separate ANN was trained for each area to predict a certain set of parameters: bearing forces (
,
); power characteristics (
,
); and lubricant flow rate (
). The input parameters of the ANNs were the values of the control action along the axes (
,
), the position of the rotor in the bearing (
,
) and its velocities (
,
). As a result, a dataset with corresponding input and output parameters was collected for each eccentricity range. Each bearing sub-area was covered by a uniform grid in polar coordinates resulting in 40 data points by angle and 7 (for the (0–0.5) subrange) or 5 (for other two subranges) data points by eccentricity. Additionally, the data at each point mentioned above were collected considering the shaft velocities (
,
) values varying in a range from −0.03 to 0.03 with 28 points, and the control actions (
,
) values varying in a range from −1 to 1 with 6 points. The resulting data vector included the following variables: [
,
,
,
,
]. The resulting dataset included 9.35 million training samples.
Fully connected ANNs with one hidden layer including 64 neurons were used to approximate the dataset representing the ALHB model. The maximal prediction error of trained ANNs compared to the verified numerical model outputs was estimated using the MEAN metric and was of 3%, 7%, 13% for the eccentricity ranges of 0–0.5 and 0.5–0.7, respectively, for bearing forces; <1% for other parameters for all ranges.
After the training and testing processes were complete, the initial ALHB numerical model then was replaced by the obtained set of ANN-based models in the simulation environment, which is described in more detail in
Section 2.5.
2.5. Full Simulation Model
The simulation model of the rotor-bearing system was developed in the Simulink tool of the MatLab R2020b software. Simscape Multibody Toolbox was used to simulate rigid body dynamics. It was used to model a shaft on two ALHB operating in parallel. Reinforcement Learning Toolbox was used to implement the DQN controller training. The structure of the simulation model in Simulink is shown in
Figure 7.
ALHBs can be represented by their numerical or ANN-based surrogate models predicting the set of parameters including (, , , , , ). The control signals and generated by a controller are the inputs of the ALHB models. Applying them results in a change in the shaft position in the bearing and the corresponding changes in other observed variables.
The transition to surrogate models speeds up the calculation by more than 20 times, which makes it possible to test more variants of hyperparameters and conditions during training. Training a DQN-controller thus usually takes 2–6 h, depending on the algorithm convergence rate at each case.
An important feature of the DQN algorithm is that it provides only discrete outputs, and their number significantly affects the training time. Therefore, the proposed controller used only 3 outputs, generating signals to increase, decrease, or leave the control signal level unchanged, respectively. The final value of the control signals in this case is provided by using the cumulative sum to accumulate signals’ values.
2.6. DQN-Based Control
The DQN method is a type of Q-learning that uses ANNs to find and implement the desired controller behavior policy. During the training process, the DQN agent obtains information about the state of the system
at each time step
. Then, it generates a control signal
in response to which the reward
is calculated based on the observation parameters. Thus, the critic ANN
is trained to predict the future reward. The error between the trained function
and the optimal function
is estimated with the Bellman equation and should be minimized during the training process [
53]:
In this study, a convolutional ANN with the structure shown in
Figure 8 was used as a critic. The critic ANN shown consists of three hidden layers with 14, 18, and 18 neurons, respectively. The input parameters used during the training process are the observation parameters from the given time step and the control parameters from the previous time step. The critic’s output is the control signals vector
. There are two parallel control loops (along
and
axes).
For the control system, a condition was set to interrupt the calculation for an emergency event, which was a collision of the rotor with a bearing, specified as follows:
Several different reward functions have been tested during the study to meet the aim described by Equation (13) and overcome some shortcomings of the trained controllers. The obtained results and their discussion are presented in
Section 3.
3. Results and Discussion
Previous studies [
38,
39] have shown that the discrete DQN algorithm used for the FFB control tasks performs well with a discrete reward function. Therefore, the following reward function was developed based on and taking into account Equations (13)–(15):
where
is the maximum permissible value of total power losses
in the current state of the system;
is the maximum value of the shaft eccentricity, assessed as safe, i.e., ensuring sufficient fluid film thickness.
The following values were set for the hyperparameters of the DQN learning agent: LearnRate had a value of 0.001, the Optimizer was Adam, the TargetSmoothFactor was 1 × 10−4, the DiscountFactor was 0.95, the MiniBatchSize was 250, the control frequency was 10 signals per second, and the discreetness of the control signal was set to 0.1. The following values were set for the parameters in Equation (16): = 2.5 W, = 0.7.
The DQN agent was trained using a perfectly balanced rotor model without external forces applied to ensure unambiguous assessment of power parameters for a certain system’s state. Since real rotors have a non-zero imbalance, the shaft center current coordinates were transmitted to the prepared DQN controller during its testing in averaged form over several periods of oscillation. These coordinates approximately characterized the position of the center of the shaft’s orbit in the bearing. On the contrary, direct coordinates values without preliminary averaging were used to check the fulfillment of the condition to assess the most distant shaft position from the bearing center.
The DQN agent training process is presented in
Figure A1 in
Appendix A. The results of testing the trained controller on the simulation model are presented in
Figure 9.
Testing of the trained system shows that the controller moves the shaft from the initial equilibrium point of the passive system to a new position. The total power consumption for overcoming viscous friction and pumping lubricant decreased from 3.35 W for the initial system state to, on average, 2.25 W for the adjusted one. However, oscillations of the control signal were observed in the system, which also led to oscillations of the shaft around its steady position, causing fluctuations in the power parameters. The reason for the occurrence of oscillations, presumably, may be a combination of the reward function settings in terms of power losses (), and the chosen step of changing the control signals ∆u = 0.1. The response of the system with such a configuration in terms of the shaft position to a minimal change in the control signal ∆u turned out to be too significant. The DQN controller moves the shaft to a more optimal position to fulfill the condition in Equation (16) regarding minimizing the power losses. However, excessive displacement and subsequent correction occurs in this case, and then the system returns to the previous state and transitions into a self-oscillating mode.
Since self-oscillations are an undesirable operating mode of the system, measures were taken to eliminate the possibility of its occurrence. Changing the reward function from discrete to continuous did not provide the desired results; oscillations in the control action persisted. Changes to the training hyperparameters also failed to eliminate oscillations. Incorporating control actions from the previous step into the reward function turned out to be the most effective solution. The modified reward function took the following form:
where
is the reward coefficient;
is the coefficient of influence of the control action on the reward, which is necessary to balance the reward value taking into account the levels of control signals.
The step of change of control signals was reduced to 0.05 to increase the precision of the control process. The control frequency was also increased to 20 signals per second. Other hyperparameters of the DQN agent were set as follows: the LearnRate was 0.001, the Optimizer was Adam, the TargetSmoothFactor was 1 × 10−4, the DiscountFactor was 0.9, and the MiniBatchSize was 250.
The value of the permissible power limit was reduced to 2 W to test the stability of control at a smaller permissible deviation from the minimum. The value was maintained at 0.7. The coefficients in Equation (17) had the following values: was set equal to 2 to ensure the level of reward increase when the penalty was charged through only one of the control loops; was set to 2 to increase the weight of the components while reducing the step of changing the control signals to 0.05.
The DQN agent training process with modified parameters is presented in
Figure A2 in
Appendix A. The results of testing the trained controller on the simulation model are presented in
Figure 10.
The test results show that taking into account the value of the control signal in the reward function eliminated oscillations and provided stable control signal generation. The shaft was also moved to a new stable position, and the total reduction in power losses turned out to be more significant than in the previous case, and reached 1.64 W, i.e., an almost twofold decrease (by 49%).
As can be seen from
Figure 10a, the main contribution to this effect was made by the reduction in power losses for lubricant pumping. In this case, a comparison of the steady-state value of the shaft in
Figure 10d with the diagram data in
Figure 6 shows that this position is some distance from the minima of each of the parameters. The optimal ratio found in this way can be rebalanced by changing the weight coefficients for the parameters
in Equation (17), as shown by Equation (13).
Further testing of the trained DQN controller involved checking its ability to provide a given value of the minimum fluid film thickness in the presence of rotor imbalance. No additional external lateral forces besides the rotor’s own weight were applied. The results of testing using the simulation model are presented in
Figure 11. The imbalance values [1 × 10
−4, 2 × 10
−4, 2.5 × 10
−4, 3.2 × 10
−4] kg·m were selected so that the first three of them did not exceed the specified limit
= 0.7, and the last one went beyond its limits and elicited a response from the DQN controller.
The graphs in
Figure 11 show that as the imbalance increased, the rotor orbit remained stable in all cases, and the conditional center of the orbits was located near the point set by the controller with minimized power losses. Exceeding the established limit of the shaft eccentricity elicited a response from the regulator. As a result, the amplitude of oscillations decreased, and a new stable orbit was established close to the permissible limits.
Analyzing the process of developing the controller and the results obtained allowed us to draw some conclusions about the use of DQN as a tool for synthesizing optimal control for the ALHB.
First of all, the resulting controllers ensured system stability in all cases; although controller-induced oscillations were observed in the first version (
Figure 9). These oscillations, although unacceptable, had a relatively small magnitude and could be compensated by modifying the reward function, as well as reducing the step of the control signal ∆
u. Testing the controller with the unbalanced rotor demonstrated the preservation of system stability, including during transient processes, as shown in
Figure 11b. This is due to the stability of the considered configuration of the rotor-bearing system with passive bearings. The use of the synthesized DQN controller ensured the fulfillment of the specified goals, and at the same time did not have a destabilizing effect, that is, the operation of the control loops turned out to be quite consistent. The ability of the presented controllers to stabilize rotor-bearing systems, which are initially in an unstable state, is not obvious. It can be assumed that this is also the subject of the correct choice of reward functions and may be the subject of further research.
Another logical consequence is the conclusion that the reward function can be easily modified and expanded with almost no restrictions. In this study, the solution to the problem of synthesizing the multi-objective controller for such a complex object as the ALHB showed the absence of fundamental differences from single-objective systems, like those previously implemented in [
38,
39]. The development process also showed that correction of individual undesirable effects could be achieved by modifying the reward function and the method’s hyperparameters, although this may require quite in-depth analysis and knowledge of the operation of the system. At the same time, the final control policy itself is determined automatically during the training process, without the direct participation of the developer, and based only on the rules specified by reward functions. This has the potential to reduce development time for complex controllers and may be a way to solve problems that are difficult to solve in traditional ways. It should also be taken into account that the dependencies inside the synthesized controller, as for any ANN, are implicit and suffer from the impossibility of a strict physical interpretation. Therefore, the selection of relevant methods for testing the trained system will also play a significant role in the development practice.
Another advantage of the considered approach is the fact that the controller is synthesized using nonlinear models of bearings. In this way, the resulting control policies are synthesized taking into account the simulated nonlinear effects, reducing the risk of unexpected responses in real systems. Although a DQN control does not directly relate to model predictive techniques, agent training occurs during interaction with nonlinear models. This feature distinguishes it for the better, for example, from the ALHB model-predictive controller presented in [
30], where preliminary linearization of the system was required with the inevitable loss of some information about its behavior.
Regarding the case of the rotor system on ALHBs considered in this study, the problem posed can be considered to be solved. The features of active lubrication noted in previous studies [
22,
29] made it possible to optimize the lubricant supply and achieve a reduction in power losses associated with its operation due to friction and pumping of lubricant with all the corresponding advantages. It should be noted that some of the objectives considered in this problem, namely the shaft position and the power parameters, were not conflicting with each other in the Pareto sense. At the same time, the parameters
, on the contrary, partially conflict with each other, though this did not become an obstacle for the controller to finding a single stable solution. Thus, a successful processing even conflicting objective seems to be a matter of correctly formulating the management problem.
Despite the listed advantages of the considered approach, a number of existing and possible problems can also be noted.
First of all, it should be noted that a significant amount of computation is required for systems such as rotor-bearing systems with FFBs. Despite measures taken to speed up calculations, such as the use of the ALHB surrogate models [
41], training for each option could take 2–6 h, depending on the current algorithm’s convergence rate. Thus, there is insufficient flexibility in the learning process. If there are implicit inconsistencies in the problem statement and unacceptable results are obtained, testing each of the reward functions, or combinations of hyperparameters can take a significant amount of time, especially for complex systems. This circumstance increases the requirements for the initially correct formulation of learning rules represented by reward functions.
Another potential problem is ANN-based models’ tendency toward overfitting. It is necessary to strictly analyze the training scenarios and correlate them with the accepted reward function in order to avoid excessive determinism in the behavior of the trained controller. A risk of “memorizing” a single scenario and its further reproduction exists for an overfitted controller. Additionally, unexpected variations in the environment in a real system can lead to unpredictable controller responses in such cases. Thus, the requirement noted above for the selection of relevant methods for testing a system for robust properties also becomes applicable here.
At the same time, even in the presence of the listed problems and risks, reinforcement learning methods seem to be quite a powerful tool for synthesizing controllers, including for complex and nonlinear systems such as FFBs. If the requirements for a system allow the use of controllers without an explicit physical interpretation of control laws, then controllers can be synthesized for it in this way, including multi-objective ones and those using several independent control loops.
The above issues of robustness, optimization of hyperparameters, and formulation of reward functions and constraints remain subjects in need of further research in the field of application of reinforcement learning for the synthesis of controllers for systems such as active FFBs. The results of such studies can provide answers to corresponding questions and contribute to the integration of the considered approaches and solutions into engineering practice in the field of rotary machines.