1. Introduction
Digitalization of the economy in general [
1] and industry in particular [
2] is a top national priority of the Russian Federation. Digitalization of technological processes in this case is associated with their advancement [
3]. Currently, development of technological processes of oil refining is carried out with the help of improvement of technology [
4,
5] and control systems and control principles of these technological processes [
6]. In this case, technological development means everything that is related to technology: advancement of apparatus design, replacement of equipment, reagents, etc. Improvement of control systems and principles means creation of new control algorithms and principally new by structure and functionality automated control systems. The development of primary oil refining processes is mainly due to the introduction of so-called advanced control systems (APC), which have already been proven to bring substantial profits to oil refineries [
3].
However, secondary oil refining processes are directly related to improvements in technology [
7,
8]. For example, moving bed catalyst reactors are used instead of a fixed bed reactor or development of new types of catalysts that increase conversion and efficiency of processes in chemical reactors [
5]. Meanwhile, improving the control systems and control principles of secondary oil refining processes is not considered a priority task. This is due to several reasons: (1) Significant profit from technological advances overshadows the profit from system advances. (2) New techniques do not allow the formation of significant experience in the automation of these processes, and therefore decisions concerning advancements of systems can be considered hasty and lacking adequate substantiation. (3) Low flexibility of the process, most parts of which can rather be perceived as a black box with no chance to change the contents. (This is due to the peculiarity of reactor processes. As a rule, the controls are made in such a way that those control actions that are applied to the reactor input give their result at the output of the apparatus. We can only change something with a loss of quality for a period of time. The change occurs intuitively, because there are no control actions while the substance is in the apparatus; however, there are many influencing factors: coke formation, reduction of the reactivity of the catalyst, etc. Therefore, from the point of view of control, the apparatus is a black box, since it is impossible to monitor the state of the substances inside the unit.) (4) The complexity of chemical processes that are difficult to determine. (5) High cost of equipment for the study of these processes, etc. However, taking into account these issues, the use of APC algorithms along with technological developments will certainly increase the efficiency of secondary oil refining processes, as well as bring additional profit to oil refineries [
9,
10]. Although advanced control systems are based upon mathematical models, it is difficult to obtain accurate mathematical models describing a process in petroleum or a related field [
11]. This applies to both mathematical kinetic and empirical models. For kinetic models, it is difficult to obtain a complete list of reactions of the process. For empirical models, it is insufficient information about the process, which makes it complicated to accumulate data to build empirical models. In this regard, the work aimed at improving the information component of the system is relevant.
Data about the hydrocarbon components contained in naphtha is used to monitor the catalytic reforming process, assess product quality, and control composition. Extended hydrocarbon composition can be obtained by chromatography. If chromatography is used to identify compounds, the retention time should be independent of the amount of sample and the chromatographic peaks should be symmetrical to ensure correct identification of the compounds. The extended hydrocarbon composition is also used as input for mathematical modeling of the process. It should be kept in mind that data obtained by chromatography cannot be extracted in real time. Usually, they are received in the laboratory over a period of at least two hours with human participation. Soft-sensing technology is used in various industries and technological facilities. The application, algorithmic and mathematical bases for these sensors are very diverse and are mainly based on neural networks, regression methods, and composition prediction. The paper by Tian et al. (2021) [
12] presents soft sensor applied in the monitoring system of a typical 330 MW CHP plant. This approach uses the turbine’s Flugel formula as a static model, the turbine’s heat balance characteristic to correct the coefficient in the model and the butterfly valve characteristic to realize dynamic compensation to realize the soft sensor. The work Niño-Adan and colleagues (2021) [
13] discusses soft-sensor for class prediction of the percentage of pentanes in butane at a debutanizer column. It includes the autoML approach that selects among different normalization and feature weighting preprocessing techniques and various well-known machine learning (ML) algorithms. The article by Winkler et al. (2021) [
14] presents soft sensor for real-time process monitoring of multidimensional fractionation in tubular centrifuges. Reference [
15] describes Soft sensor for industrial distillation column. The authors Hsiao et al. (2021) propose soft sensor development methodology combining first-principle simulations, and transfer learning was used to address these problems.
One of the elements of advanced control systems is the virtual sensor [
16]. Virtual sensors calculate parameter values using statistical dependencies (a polynomial), a neural network, or other mathematical tools to determine correlation between variables [
17,
18]. This method involves the accumulation of a large volume of data and its further processing using various approaches [
19] including those mentioned earlier. For a catalytic reformer, various variables can act as deterministic parameters for the virtual sensor. However, in some cases, the creation and implementation of virtual sensors for some variable process is highly difficult and even impossible. This is due to the fact that the large sample of data history for this segment does not exist, or their synchronization is troublesome. In particular, to be more specific, the process of creating a virtual soft sensor of the feedstock composition is a challenging task. The reason for this is the mismatch between the company’s capabilities to measure individual hydrocarbon composition in a number of industrial processes and the data requirements of the virtual sensor. In this case, data obtained on the individual hydrocarbon composition of the feedstock in real time is an effective tool for optimizing technological processes that take place in a catalytic reforming unit. The need to optimize technological processes in this matter is caused by tough requirements for environment protection [
20] and the influence of the modern trends in the development of the global energy sector [
21,
22].
It is important to reduce the uncertainty arising from infrequent composition control in processes such as catalytic reforming where the individual and group composition of the feedstock determines the target performance of the unit and the catalyst lifespan. Such uncertainty in the feedstock composition can complicate the application of mathematical models in the loop of an advanced control system or as an advisor to the operator [
23], which can result to fluctuations in product target performance over the specification limits in the absence of the advanced control system. Studies of naphtha catalytic reforming process have been carried out for a long period of time [
24]. During this period, a large number of [
25] complex, highly precise, and detailed mathematical models of the catalytic reforming process, simulating different naphthas with various amount of detail, have been developed. The following steps were highlighted in the study of research and work: the effect of changes in feedstock composition at the naphtha catalytic reforming unit is considered [
26]; consider the parameters of the working process of coke combustion, comparing the results with industrial data [
27]; conduct a comprehensive sensitivity analysis of the quality and quantity of the product [
28] without taking into account the impact of changes in the composition of raw materials of the process; the influence of the design parameters of a catalytic reforming reactor, the molar flow rate on the hydrodealkylation side, the molar ratio of hydrogen to hydrocarbons, the impact of catalyst deactivation on the system performance are subjected to the research [
29]; the modes of incoming and outgoing flows in reactors with thermal coupling are analyzed [
30].
A certain technological level of the unit that meets the requirement of the mathematical model for the size of the input matrix is needed to introduce the developed mathematical models in the existing production facilities. The model input matrix can be obtained from the results of analytical control of the individual hydrocarbon composition of raw materials, but inline control is not applied at all refineries. This raises the question of how to provide the mathematical model with up-to-date input information about changes in the composition of the workflow under operating production conditions, and whether this control of the feedstock composition of a catalytic reforming unit can be performed more frequently at an operating production facility.
A review Ren and colleagues (2019) [
31] of methods for converting individual composition into fractional composition and vice versa showed several approaches. Most of the approaches are formed on a multidimensional base for controlling several parameters besides composition, which implies a preparatory stage of model development. Incomplete data and checking their correctness results in the use of data processing and recovery methods. The researchers consider the dependences of the mixture properties on the compound identification parameters [
32,
33,
34], individual constants, and characteristics of the compound [
35], which is an important and necessary basis for this study.
The paper discusses a method for obtaining a matrix of the carbon number and group composition of the feedstock of a catalytic reforming unit in industrial conditions. A group composition of petroleum fractions during an oil refining processes is the most important factor influencing in the yield and composition of products, as well as an efficiency of the catalysts. The fuels ASTM D86 distillation temperature distribution is divided into equal-volume pseudo-component cuts, each of which is assigned a property volume blending index the aggregation of which provides an accurate estimation of the global property of the whole petroleum fuel, or portions thereof. The list of these pseudo-components is the group composition of petroleum fractions [
36]. It is envisaged that it is possible to find a matrix of carbon number and group composition of hydrotreated catalytic reforming naphtha close to the experimental one by expressing [
37] the desired composition through close fractions of known individual hydrocarbon compositions. The evaluation of the fraction proximity is determined by the associated boiling points. This is known due to the fact that the heavier in molecular weight individual components that make up the fractions have higher boiling points than the lighter ones.
The retention index is a common type of data used to identify chemical compounds by gas chromatography. The retention index system is a widely used and recognized system in gas chromatography for the identification of compounds. The paper by Yan et al. (2015) [
38] describes that the database retention indices of over 300 aroma compounds that were determined on three capillary columns of different polarity can be used for qualitative identification. The work [
39] shows that retention indices of 28 polychlorinated biphenyls in capillary gas chromatography referred to 2,4,6-trichlorophenyl alkyl ethers as RI-standards. The paper by Morosini and Ballschmiter (1994) [
39] presents that on the basis of the TCPE, the retention indices of 28 polychlorinated biphenyls were determined using the ECD, a 95% dimethyl 5% phenyl polysiloxane phase and six different temperature programs. In addition, there are a number of studies in practice that have generated a system of retention indices in different ways [
40,
41,
42].
2. Materials and Methods
The development of a model for a virtual soft sensor of the feedstock composition can be divided into two stages: preparatory and computational. The preparatory stage includes the analysis and processing of the obtained data, determination of the method of obtaining fractions from the individual composition, and the formation of a database of individual components and associated boiling points of fractions. The description of the preparatory stage is formed on the lack of information on the chromatographic system and the fractional composition control system based only on the available measurement data. A chromatographic system is defined as a set of hardware and methods that allow chromatography to be performed. The need of these operations at each stage will be discussed further.
According to the technological regulations of the enterprise, the individual and group composition is controlled according to the IFP 9301 standard, which recommends the use of gas chromatography with a 100 m long fused-silica capillary column with an inner diameter of 0.25 mm. According to the standard, the capillary column is coated with methylsilicone elastomer or dimethylsiloxane, 0.5 μm thick, and has to be equivalent to at least 6000 theoretical plates/m; a linear retention index (n-alkane) is used to identify the components. The fractional composition is controlled according to the ASTM D86 method.
2.1. Preparatory Stage
Check the presence and repeatability of the distribution law in the IFPi homologous series. If the data obey the distribution law, then composition models based on these laws can be used. Determine the retention time of non-absorbent substance and possible parameters of the chromatographic system for the identification of compounds [
37]. However, reference sources on retention indices provide single values for individual substances and there are no confidence interval limits of their measurement, which leads to uncertainty in identification [
43]. If the report on the control of individual and group composition of raw materials records the given time, then calculate the matrix of minimum ΔRI from all reports for each homologous group by carbon number by Equation (1):
where ΔRI is the difference in the retention indices of adjacent compounds in the report, RI
i is the retention index of the i-th compound, and RI
i-1 is the retention index of the previous compound to the i-th. The chromatographic system identifies a component by its retention index, and therefore it is important that the maximum deviation from the mean in the retention index of each compound in different reports does not exceed the ΔRI value for the corresponding homologous group of a matrix of minimum ΔRI. If the value of deviation of the retention index exceeds the corresponding ΔRI, then this indicates that the data are incorrect, and that compound cannot be correctly identified. Moreover, the matrix of minimum ΔRI and average values of the retention indices can be used as an indicator of the chromatographic system performance, automatically checking the deviations of the new composition measurement, since visual assessment of the chromatogram allows for human error.
For identified compounds with unknown boiling point the experimental values of the parameter are taken from the reference sources [
35]. Construct the function between the normal boiling point of a compound and its retention index within one homologous series [
33,
44]. For unidentified compounds, determine its boiling point according to the constructed mathematical relation.
Determine actual ASTM boiling point intervals (min and max) for a given period of unit operation. In this case, the period of operation of the unit should be representative (historical data should cover the entire range of variation in the feedstock composition). This will allow for assessment of the range of change in the fractional feedstock composition.
Construct theoretical curves [
45] corresponding to the mixture distillation simulated curves. The obtained simulated distillation curves are set in the Hysys/Pro II simulation program, specifying the composition of the mixture, which is the beginning of its boiling. Calculate the D86 boiling curve and enter the obtained values into the database as an associated fractional composition with an individual and group composition.
Theoretical curves are derived from the characteristic boiling points of the mixture from the individual hydrocarbon composition of the feedstock. The characteristic boiling points of a mixture are close values to the boiling points of the mixture at the corresponding cumulative fractions of the mixture. They uniquely characterize the entire mixture fraction taken in the interval of the corresponding cumulative fractions of the mixture by considering the boiling point of each compound of the fraction in accordance with the fraction occupied by this component in the given fraction of the given hydrocarbon com-position. Cumulative fractions are calculated in accordance with the principle of additivity of fractions of mixture components. The fraction taken from the individual hydrocarbon composition is considered separated from the rest of the mixture, and equated to 100%, the fractions of individual components in it are recalculated and used as weight coefficients when adding temperatures of each compound in the taken fraction. Thus, we obtain a unique temperature characterizing the fraction through the temperatures of the compounds of its constituents and close to the experimental boiling point of the mixture at the corresponding cumulative fraction of the mixture. The beginning of boiling of the mixture is determined on the basis of the algorithm of finding the experimental boiling points of the mixture. The obtained characteristic boiling points of a mixture of individual hydrocarbon composition are taken as a simulated distillation curve (SD) and, using the procedure 3A.3.2 API-TDB 1997 [
46], convert them to an ASTM fractional boiling curve. We estimate the belonging of the obtained ASTM boiling curve according to the available actual boiling point ranges according to ASTM.
The prepared IFPi and their corresponding boiling points of the fractional composition are recorded in the non-relational database as the key value. The key in this case is the date of chromatography, associating the data of the two compositions, and the values are the report of the individual hydrocarbon composition and the corresponding boiling curve.
2.2. Computational Stage
Compare each point of the measured D86 boiling curve with the corresponding point by volume fraction point of the boiling curve from the prepared database. For comparison, we use the module of the difference between the measured and associated boiling point from the prepared database. A reference book with the keys of delta temperatures and values of chromatography dates with a length equal to the number of keys in the prepared database is created in the operating memory of the computer.
In the temperature delta reference book, search for the minimum temperature delta for each boiling point of the hydrocarbon mixture. As a result, one obtains a list consisting of an ordered sequence of dates and the corresponding boundary cumulative fraction of the hydrocarbon mixture.
The IFPi fractions sequence is determined from the list of dates. To obtain a sequence of fractions, we use the algorithm for obtaining a fraction from IFPi by cumulative fractions of the mixture by referring by date to the IFPi in the prepared IFPi database and the boundary cumulative fraction of the hydrocarbon mixture. We obtain a list of sequences of individual mixture components expressed from the nearest IFPi fractions. The resulting sequence is recorded in the database of estimated compositions for the possibility of performing analysis and statistical assessment of changes in the composition over time.
Obtaining the MTHS matrix (MTHS—molecular type and homologous series). We find the scoring matrix of the carbon number and group composition of the mixture. The method used to assess the proximity of the sought individual composition and the experimentally obtained composition requires reducing the IFPi to a matrix form. This covers the cases of repeating the dates at step 2 and possible duplicates of the names of the boundary components of the IFPi fractions. In this case, the values of the fractions of the components, for which the individual composition was incremented, are not repeated for the duplicate names, and do not violate the additivity principle of the mixture.
The
Figure 1 shows the block diagram of the model for assessing MTHS composition by the ASTMi boiling.
The measured ASTMi boiling curve of size 1 × 7 is fed to the input to the model. On the basis of the minimum temperature difference, the model determines the closest associated boiling point for each ASTMi boiling point fed to the input. According to the mixing rule, the MTHS matrix of the hydrocarbon mixture composition is calculated on the basis of the nearest boiling points of fractions found in the BPi virtual soft sensor database.
The presented virtual model of the soft sensor can be verified using four available reports of individual and group composition of the hydrocarbon mixture. These reports were created by monitoring the composition of the hydrotreated heavy gasoline fraction of a catalytic reforming unit (CCR) in different months of different years according to IFP 9301.
Let us conduct an experiment with the model, taking one of the four IFPi as unknown, and feeding the associated ASTMi boiling curve, taken as unknown associated IFPi, to the input to the model. As a result of the experiment with the model, we obtain the estimated MTHS matrix of the unit feedstock composition, taken as unknown. The estimated matrix is compared with the experimental matrix via reducing to the PIONA (paraffins, iso-paraffins, olefins, naphthenes, aromatics) vector, obtained by adding the respective fractions of compounds belonging to one of the five types of compound groups.
IFPi are represented by adsorption sequences of various lengths without repeating names, consisting of a list of individual components with diverse fractions of compounds in the mixture, with different boiling points. The various lengths of the reports and the difference in the positions of the same compound complicate assessing the proximity of the compositions in this form. However, the report on the considered raw materials can be reduced to an 11 × 5 matrix. The columns are the homological series, while the rows are the carbon numbers of the compound or several compounds of the same group. This approach will allow us to quantitatively assess the proximity of compositions by the components of the vector PIONA.
The accuracy of the data taken is determined by the accuracy of the DCS (distributed control system) and LIMS (laboratory information management system) systems operating on the unit, as well as by the accuracy of the sensor equipment used.
In addition, when describing the experiment, it is worth noting that the enterprise has internal standards that describe the required accuracy of the system operation and the laboratory tests carried out, which indirectly indicates the sufficient reliability of the data obtained in this manner.