Next Article in Journal
Effects of Autogenous Gas Tungsten Arc Welding (GTAW) on Corrosion Resistance of Stainless Steel 316L
Previous Article in Journal
Residue from Passion Fruit Processing Industry: Application of Mathematical Drying Models for Seeds
Previous Article in Special Issue
Integration of Chemical Looping Combustion to a Gasified Stream with Low Hydrogen Content
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Predicting Petroleum SARA Composition from Density, Sulfur Content, Flash Point, and Simulated Distillation Data Using Regression and Artificial Neural Network Techniques

1
LUKOIL Neftohim Burgas, 8104 Burgas, Bulgaria
2
Institute of Biophysics and Biomedical Engineering, Bulgarian Academy of Sciences, Georgi Bonchev 105, 1113 Sofia, Bulgaria
3
Laboratory of Intelligent Systems, University Prof. Dr. Assen Zlatarov, Professor Yakimov 1, 8010 Burgas, Bulgaria
4
Department of Mathematics, University of Chemical Technology and Metallurgy, Kliment Ohridski 8, 1756 Sofia, Bulgaria
5
Department of Health and Pharmaceutical Care, University Prof. Dr. Assen Zlatarov, Professor Yakimov 1, 8010 Burgas, Bulgaria
6
Department Industrial Technologies and Management, University Prof. Dr. Assen Zlatarov, Professor Yakimov 1, 8010 Burgas, Bulgaria
7
Black Oil Solutions, 2401 Alphen aan den Rijn, The Netherlands
*
Author to whom correspondence should be addressed.
Processes 2024, 12(8), 1755; https://doi.org/10.3390/pr12081755
Submission received: 2 July 2024 / Revised: 13 August 2024 / Accepted: 19 August 2024 / Published: 20 August 2024
(This article belongs to the Special Issue Technological Processes for Chemical and Related Industries)

Abstract

:
The saturate, aromatic, resin, and asphaltene content in petroleum (SARA composition) provides valuable information about the chemical nature of oils, oil compatibility, colloidal stability, fouling potential, and other important aspects in petroleum chemistry and processing. For that reason, SARA composition data are important for petroleum engineering research and practice. Unfortunately, the results of SARA composition measurements reported by diverse laboratories are frequently very dissimilar and the development of a method to assign SARA composition from oil bulk properties is a question that deserves attention. Petroleum fluids with great variability of SARA composition were employed in this study to model their SARA fraction contents from their density, flash point, sulfur content, and simulated distillation characteristics. Three data mining techniques: intercriteria analysis, regression, and artificial neural networks (ANNs) were applied. It was found that the ANN models predicted with higher accuracy the contents of resins and asphaltenes, whereas the non-linear regression model predicted most accurately the saturate fraction content but with an accuracy that was lower than that reported in the literature regarding uncertainty of measurement. The aromatic content was poorly predicted by all investigated techniques, although the prediction of aromatic content was within the uncertainty of measurement. The performed study suggests that as well as the investigated properties, additional characteristics need to be explored to account for complex petroleum chemistry in order to improve the accuracy of SARA composition prognosis.

1. Introduction

Petroleum is an extraordinarily complex mixture consisting of alkane, cycloalkane, aromatic hydrocarbons with different molecular weights, and heteroatom organic compounds containing nitrogen, sulfur, oxygen, and metals [1,2]. The exact number of species making up petroleum is still unknown, although some authors suggest that there are millions [3,4,5], while others state that there are thousands [6,7]. Alkane hydrocarbons in petroleum were found to range from traces of C2H6, dissolved in the liquid petroleum, to C78H158 [1]. Cycloalkanes were reported to have between 1 and 11 cyclic rings with a maximum carbon number of 92 [8]. Aromatic hydrocarbons were reported to have between 1 and 18 condensed rings [9]. The content of the elements nitrogen, sulfur, oxygen, and metals in petroleum can vary as follows: 0.05% ≤ N ≤ 3.0%; 0.05% ≤ S ≤ 9%; 0.05% ≤ O ≤ 1.5%; and 0 ppm ≤ metals ≤ 1500 ppm [10].
All petroleum components can be segregated based on their solubility in four groups: saturates, aromatics, resins, and asphaltenes, called SARA fractions [11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29]. The saturate fraction is composed of alkane, cycloalkane hydrocarbons, negligible amounts of nitrogen (0.05%), sulfur (0.1% ≤ S ≤ 0.9%), and oxygen (0.1%) [17]. The aromatic fraction contains arene hydrocarbons with between one and six fused aromatic rings with attached cycloalkane rings, and alkyl chains with different lengths [30], and variable amounts of nitrogen (0.05% ≤ N ≤ 0.7%), sulfur (0.4% ≤ S ≤ 6.4%), and oxygen (0.1% ≤ O ≤ 2.9%) [17]. The resin fraction, also called polar aromatics, is composed predominately of compounds containing between one and six condensed aromatic rings and nitrogen (0.7% ≤ N ≤ 1.2%), sulfur (0.7% ≤ S ≤ 6.1%), oxygen (1.9% ≤ O ≤ 2.7%) [17], and metals (0 ppm ≤ metals ≤ 980 ppm) [31]. The asphaltene fraction is composed mostly of aromatic ring structures with numbers of aromatic rings between 1 [32] and 18 [9], and the poly-condensed aromatic structures are pericondensed [12]. The asphaltene contents of nitrogen, sulfur, oxygen, and metals can vary as follows: 0.5% ≤ N ≤ 2.5%; 0.3% ≤ S ≤ 9%; 1% ≤ O ≤ 1.7%; and metals (0 ppm ≤ metals ≤ 1500 ppm) [31].
Petroleum can be also fractionated into cuts based on the difference of the volatility of its compounds. The variation of amount of different cuts in petroleum can be as follows: gas (C2–C5) (0% ≤ gas ≤ 8.2%); C5–fraction (0% ≤ C5/70 °C ≤ 20.7%); light naphtha (0% ≤ 70 °C/100 °C ≤ 18.9%); medium naphtha (0.2% ≤ 100 °C/150 °C ≤ 31.2%); heavy naphtha (0.8% ≤ 150 °C/190 °C ≤ 16.2%); kerosene (2.2% ≤ 190 °C/235 °C ≤ 16.6%); light diesel (3.4% ≤ 235 °C/280 °C ≤ 21.3%); heavy diesel (6.5% ≤ 280 °C/343 °C ≤ 26.3%); vacuum gas oil (4.8% ≤ 343 °C/565 °C ≤ 50.0%); and vacuum residue (0% ≤ 565 °C+ ≤ 61%). These data have been obtained from our research laboratory library that includes hundreds of crude oil assays from all over the world which pertain to the five main groups: extra light, light, medium, heavy, and extra heavy crude oils.
The SARA composition, in addition to providing information on the chemical nature of oil, is used for oil classification and evaluation [21,33]. It is also used to assess oil colloidal stability and crude oil compatibility [34,35,36,37,38,39,40,41,42,43,44,45], fouling [46,47,48,49,50,51], flow assurance [52], oil–water emulsion stability [53,54], determination of interactions in crude oils [55,56], product stability [57,58], and oil reactivity in different conversion processes [59,60,61,62,63,64]. It is also employed to predict petroleum properties like density [65], viscosity [66,67], refractive index [68,69,70], boiling temperature [71], and others.
The significance of SARA composition for crude oil characterization explains the research interest in prediction of the contents of SARA fractions from easily available petroleum data and the use of computational methods [72,73,74,75,76,77,78].
Melendez et al. [72] developed eight chemometrics models to predict SARA composition, measured in accordance with ASTM D 4124 [79] requirements, of 50 Colombian crude oils, using Fourier transform infrared coupled to attenuated total reflectance (ATR–FTIR). They classified the crude oils into two groups depending on their asphaltene content: (1) 0.17 wt.% ≤ asphaltenes ≤ 3.8 wt.%; (2) 5.5 wt.% ≤ asphaltenes ≤ 17.9 wt.%. For each SARA component, standard errors of prediction (SEP) for group 1 were 1.9, 1.7, 1.3, and 0.4. For group 2, SEP were 2.5, 1.7, 3.7, and 1.4, respectively. The coefficients of determination (R2) were higher than 0.95. They showed that IR spectroscopy coupled with an ATR cell plus chemometric techniques can provide an alternative way for the quantitative prediction of the SARA composition of petroleum oils whose SARA composition varies in the range 15.0 wt.% ≤ saturates ≤ 66.2 wt.%; 20.3 wt.% ≤ aromatics ≤ 40.2 wt.%; 8.6 wt.% ≤ resins ≤ 39.0 wt.%; 0.17 wt.% ≤ asphaltenes ≤ 17.9 wt.%.
Sanchez-Minero et al. [73] investigated five crude oils with SARA composition fluctuating in the range 20.0 wt.% ≤ saturates ≤ 51.6 wt.%; 11.4 wt.% ≤ aromatics ≤ 31.4 wt.%; 14.3 wt.% ≤ resins ≤ 36.3 wt.%; 2.8 wt.% ≤ asphaltenes ≤ 32.3 wt.%, predicting their SARA composition from nuclear magnetic resonance (1H NMR and 13C NMR) data with R2 higher than 0.989. They employed the ASTM D 3279 method [80] to measure the content of n-heptane insoluble (asphaltenes), and the ASTM D 2007 [81] method to determine the concentration of saturates, aromatics, and resins in the five studied crude oils whose specific gravity varied in the range 0.8612 ≤ specific gravity ≤ 0.9979.
Mohammadi et al. [74] measured the SARA composition of 12 crude oil samples from different Iranian oil fields using the ASTM D 4124 method. ATR-FTIR spectroscopy results were used to obtain correlations for prediction of SARA fraction contents by application of a hybrid of genetic algorithm (GA), support vector machine regression (SVM-R), and genetic algorithm–partial least square regression (GA-PLS-R). The crude oil SARA fraction content varied in the range 37.3 wt.% ≤ saturates ≤ 72.4 wt.%; 15.4 wt.% ≤ aromatics ≤ 42.9 wt.%; 7.0 wt.% ≤ resins ≤ 15.0 wt.%; 0.9 wt.% ≤ asphaltenes ≤ 13.1 wt.%. The non-linear GA-SVM-R model demonstrated higher accuracy in SARA fraction content prediction (R2 ≥ 0.98) than the linear GA-PLS-R (R2 ≥ 0.94).
Florez et al. [75] studied 67 samples of crude oils from different zones of Colombia, availing the standards ASTM D 4124, and ASTM D 2007 to measure their SARA fraction contents. The variation of SARA composition of these 67 crude oils was as follows: 12.4 wt.% ≤ saturates ≤ 41.9 wt.%; 8.3 wt.% ≤ aromatics ≤ 47.2 wt.%; 2.0 wt.% ≤ resins ≤ 37.1 wt.%; 0.4 wt.% ≤ asphaltenes ≤ 17.4 wt.%. The researchers used the Raman spectra of the crude oils and partial least square regression (PLSR) to develop correlations that predicted petroleum SARA fraction contents with R2 ≥ 0.94.
Pantoja et al. [76] applied steady-state and time-resolved fluorescence measurements for 21 crude oils whose SARA composition varied in the range 43 wt.% ≤ saturates ≤ 63 wt.%; 27 wt.% ≤ aromatics ≤ 40 wt.%; 7 wt.% ≤ resins ≤ 14 wt.%; 2 wt.% ≤ asphaltenes ≤ 6 wt.%. The SARA composition of studied 21 crude oils was measured by employment of liquid chromatography with a preliminary precipitation of asphaltenes with n-pentane. Using total synchronous fluorescence scan (TSFS) spectral map data, partial least squares (PLS) correlations were developed to predict the crude oil SARA composition with R2 ≥ 0.88.
Yarranton [77] proposed an approach to predict the saturate content of crude oil from a simulated distillation (SimDist) assay, the asphaltene content of the oil, and the oil density. The uncertainty of saturate content prediction was reported to compare favorably with the minimum ±3 wt.% uncertainty of the saturate content from a SARA assay. It was pointed out that the proposed method was based on more reliable measurements than the SARA analysis, thereby offering a means for a more standardized determination of saturate content.
Kulkarni et al. [78] utilized SARA composition data of 216 crude oils from six different literature sources to develop SARA fraction content prediction from petroleum density and viscosity by employing artificial neural network (ANN) techniques. The correlation coefficient R of predicted versus measured SARA fraction contents were reported as follows: saturates (R = 0.844); aromatics (R = 0.54); resins (R = 0.782); asphaltenes (R = 0.745). The literature sources used by Kulkarni et al., however, indicated that the SARA composition was measured by different procedures. For example, Kulkarni et al. [78] used a set of 44 oil SARA compositions from all 216 SARA data to develop their ANN model from the work of Jokuty et al. [82]. Those 44 oil SARA compositions were measured in accordance with the ASTM D 2007 requirements. This means that the asphaltenes measured by Jokuty et al. [82] were C5-asphaltenes, because according to the ASTM D 2007 standard, asphaltenes are precipitated with n-pentane. Another literature source [83] referenced by Kulkarni et al. [78] reported that the asphaltene determination of the studied three crude oils was performed by using either the n-pentane or n-heptane precipitation method. Additionally, another six crude oils whose SARA composition [84] was also utilized by Kulkarni et al. [78] to develop their ANN models were analyzed for their asphaltenes by precipitation with petroleum ether that was neither n-pentane nor n-heptane. Understandably, the different procedures applied to measure asphaltenes may result in different reported values for the same oils, which can make the use of such a database unreliable and lead to inaccurate predictions. Therefore, a single-source database should be considered as a preferred way to more appropriately model SARA composition. In addition, distillation data, which have been shown to be influenced by the molecular type composition of the crude oil [71], may be also deemed meaningful to include in the ANN models.
Yarranton’s [77] idea of using oil density and simulated distillation to predict oil saturate content can be further developed to predict the contents of the other three groups of petroleum compounds (aromatics, resins, and asphaltenes) using artificial neural networks. To achieve this, we employed the data of 94 petroleum samples taken from [85], whose SARA composition had been measured following a procedure described therein. Using data from a single source avoided the use of disparate SARA composition data reported in the literature for different SARA analyses [24]. Artificial neural network models to predict the four SARA fraction contents and non-linear regression models were developed and compared in this research.
SARA prediction models typically employ regression and metaheuristic methods [72,73,74,75,76,77,78]. However, no comparison between the two techniques has been provided so far. That was the reason for us to employ linear and non-linear regression along with an ANN as a metaheuristic method to model petroleum SARA composition, combining their capability to accurately predict SARA composition using the same method of measurement. The aim of this study is to discuss the results of SARA oil composition modeling obtained using an ANN with linear and non-linear regression.

2. Materials and Methods

The set of data for SARA composition, density, sulfur content, flash point, and high-temperature simulated distillation (HTSD) of the 94 petroleum samples used for SARA fraction content modeling is presented in Table S1, while Table 1 indicates the range of variation in these properties. The data were taken from [85], which contains 100 oil SARA compositions, and the bulk properties of flash point, pour point, viscosity, and surface tension. Unfortunately, not all the petroleum samples had been analyzed for all these properties. The biggest selection from this data set of 100 petroleum fluids was that including SARA composition, density, sulfur content, and flash point. This selection consisted of 94 data points, which are shown in Table S1. The petroleum data set is shown in Table S1; the reported mass balance of SARA composition was between 99 and 102 wt.%, with prevailing data of 100 wt.%.
Three data mining (DM) or knowledge discovery in databases (KDD) techniques were employed in this study. The first one aimed at understanding the analyzed petroleum SARA data and the relations between the oil characteristics shown in Table 1. It was an intercriteria analysis (ICrA), built on intuitionistic fuzzy sets and index matrices. Two parameters μ and υ were calculated in the process of ICrA evaluation of oil SARA data by using specialized software freely available as open source from https://intercriteria.net/software/ (accessed on 20 May 2024), and described in detail in [86,87,88,89]. ICrA avoids the term “correlation” between investigated variables and instead applies the terms “positive consonance”, “negative consonance”, and “dissonance”. For μ = 0.75–1.00 and υ = 0–0.25, a region of statistically meaningful positive consonance is obtained, whereas at μ = 0–0.25 and υ = 0.75–1.00, an area of statistically meaningful negative consonance is derived. All other cases are deemed to be dissonance. A comprehensive description of ICrA application in oil refining is given in [90]. ICrA is preferred as a method to search for the presence of statistically meaningful relations which are both linear and non-linear. In contrast, conventional correlation analysis registers the presence only of linear relations.
The second and third DM techniques used in this work aimed at predicting oil SARA composition are the regression and ANN approaches. Two types of models were used for regression analysis: linear and non-linear (i.e., linear combination of powered independent variables). The optimal parameters of regression coefficients were identified by means of the non-linear least squares method (LSM). In both cases, as an optimization tool, the differential evolution (DE) algorithm was used to search for the best parameter values in compact subsets in the parameter space. The DE algorithm does not require differentiability or even continuity of the optimized function. Moreover, it is very applicable to construct some necessary bounds in the parameter space. All calculations were realized by the use of CAS Maple 2024, Global Optimization Tool (method = diffevol) and verified by a simple Python script, based on a differential evolution package from scipy.optimize library.
Using artificial neural networks leverages the natural connectivity of individual neurons to create an architecture for predicting the hydrocarbon group composition of oil. The neural network constructed in this work was such that the outputs of each layer were connected to the inputs of the next layer. In this case, the neural network had no feedback loops, making it a feedforward neural network. Its structure was as follows: at the entrance where the input part of the training sequence was fed, the first hidden layer had 123 neurons. Next, the second layer had 42 neurons, the third contained 16 neurons, the fourth contained 10 neurons, the fifth had 8 neurons, and the sixth layer contained only 1 neuron. The neural network used in this study was feed-forward (no feedback from any of the layers to any of the inputs of the neural network). To make it easier to train and in order not to occupy too much computer memory, larger numbers of neurons were used in the first layers, and the number of neurons gradually reduced. In our case, the output neural layer had only one neuron (as the number of parameters to be predicted). In neural network training, the data are divided into three parts: training, validation, and testing. The training set is employed to train the model, the validation set assists in model selection and hyperparameter tuning, and the test set evaluates the ANN implementation on data not included in the training or validation data. The moment the error computed on the validation data starts to increase, it is considered best to stop training the neural network. At this point, the neural network training data are reported. In determining the architecture of the ANN to strike a balance between the accuracy of the neural network, its training speed, and the amount of memory which is used for it, it is also important to note that a larger number of neurons in the neural network introduces additional systematic error into its training. When determining the structure of the neural network, we were guided by the complexity of the process. The application of a simpler neural network structure like, for example, 100-50-25-12-6, resulted in a higher mean squared error at the network’s output, due to the complexity of the process. By increasing the number of neurons in the critical first layer, we sought the best result. However, further increasing the number of neurons led to an accumulation of the error that accompanied each neuron. In this way, we experimentally found the balance in the numbers of neurons in each layer that resulted in the lowest mean squared error. At the input of the neural network, the following data were provided: density at 15 °C, g/cm3; flash point, °C; sulphur, wt.%; and the yield of components boiling up to 180 °C, 200 °C, 250 °C, 300 °C, 400 °C, 500 °C. At the output of the neural network, we sequentially provided the following: saturates, wt.%; aromatics, wt.%; resins, wt.%; asphaltenes, wt.%. In this way, we trained the neural network to predict the four SARA fraction contents. The ANN SARA composition modeling was accomplished using Matlab 2020 software. The training, validation, and test sets of the data used by the ANN were divided as follows: 68% for training; 21% for testing; and 11% for validation. These data sets for the four fractions separated into training, validation, and testing are presented in Tables S2–S5. A random splitting algorithm was employed for the data allocation.
The regression models were developed by using 89% of the data for training and 11% for validation.
The following statistical parameters, presented as Equations (1)–(5), were availed to evaluate the accuracy of the SARA prediction models:
Absolute   deviation   AD :   D e v i a t i o n = S A R A e x p S A R A c a l c
Average   absolute   deviation   AAD :   A A D = S A R A e x p S A R A c a l c N
Average   bias : A v e r a g e   B i a s = ( S A R A c a l c S A R A e x p ) N
Standard   deviation   SD :   S D = S A R A e x p S A R A e x p ) 2 N 2
  Multiple   correlation   coefficient   R : R = 1 S A R A e x p S A R A c a l c 2 S A R A e x p i   S A R A e x p a v e r a g e 2

3. Results and Discussion

3.1. Intercriteria Analysis of Petroleum Data

In order to investigate the relations within the oil data from Table S1, ICrA evaluation was performed and the results of this assessment are summarized in Table 2 and Table 3.
The data in Table 2 and Table 3 indicate evaluation of relations between investigated oil properties via ICrA pairs μ and ν. Density is a fundamental property of petroleum and is related to the content of aromatic structures [91]. The higher the density, the higher is the content of aromatic structures and, accordingly, the lower the content of saturates. This is illustrated by the negative consonance between density and saturate content (μ = 0.17; ν = 0.80) and the positive consonance with the contents of resins and asphaltenes (μ = 0.84; ν = 0.13; and μ = 0.75; ν = 0.17, respectively. The petroleum components that contained aromatic structures were the fractions aromatics, resins, and asphaltenes [91]. Among these, only the aromatic fraction exhibited lack of statistically meaningful relation to density (μ = 0.68; ν = 0.28), in line with the report by Kulkarni et al. [78] communicating that the content of aromatic fraction is not well predicted from density and viscosity using the ANN technique. One may suggest that density is more influenced by the contents of heavy condensed aromatic compounds contained in the heavier SARA fractions resins and asphaltenes [92].
The flash point did not have any statistically meaningful relation to any of the studied properties for this data set.
Sulfur content displayed a weak statistically meaningful negative consonance with saturates (μ = 0.23; ν = 0.73), and weak negative consonances with the yield of fractions boiling up to 400 and 500 °C (μ = 0.25; ν = 0.72).
The content of saturates had statistically meaningful negative consonances with the aromatic structure fractions: aromatics (μ = 0.13; ν = 0.83); resins (μ = 0.11; ν = 0.84); and asphaltenes (μ = 0.16; ν = 0.76).
The content of aromatic compounds, with the exception of the content of saturate fraction, had no statistically significant relationship with any other property examined.
The yields of fractions boiling at 300, 400, and 500 °C exhibited negative consonances with resin and asphaltene contents. These findings seem reasonable because it is known that resin and asphaltene fractions are mainly concentrated in the higher-boiling part of the petroleum. It is evident from the data in Table 2 and Table 3 that between the yields of fractions at 300, 400, and 500 °C, there are relatively strong positive consonances, which supposes that for the modeling purposes of regression analysis only one of them should be used. The data in Table 2 and Table 3 also show the following:
  • The sulfur, saturate, resin, and asphaltene contents exhibited some kind of enhancement of the relations with the yields of boiling fractions with increasing boiling temperature from 180–200 °C to 300–500 °C. For example, the relation of resin content to fraction boiling point expressed by μ decreased from 0.25 (boiling temperature of 180 °C) to 0.20 (boiling temperature of 300 °C) while ν increases from 0.68 to 0.75. However, these relations were in general weak, and the augmentation of μ and υ could not be considered substantial;
  • Wavering in density and saturate, aromatic, and resin contents with magnification of the boiling point of oil fractions, for example, the relation of resin content to fraction boiling point expressed by μ oscillated at about 0.25 and 0.26 at lower boiling points, and at about 0.20 and 0.21 at higher boiling points; opposite behavior of saturate content versus density, aromatic, resin, and asphaltene contents, for example, while the saturate content was in a negative consonance with density (μ of 0.17, ν of 0.80), the density exhibited a positive consonance with the contents of resins and asphaltenes (μ of 0.81, ν of 0.13; and μ of 0.75, ν of 0.17, respectively),
  • lack of pronounced proximity in density, aromatic, resin, and asphaltene content values relative to each other.

3.2. Artificial Neural Network Models to Predict Oil SARA Composition

3.2.1. Saturate Content Prediction by ANN

Figure S1 illustrates how the neural network training process was accomplished to develop the ANN saturate content prediction model. It is evident from the data in Figure S1 that the maximum number of epochs allocated for training was 5000, the time for the ANN training took 40 s, the desired mean squared error was set at 1 × 10 6 , the achieved mean squared error was 0.000451, and the training method used was that of Levenberg–Marquardt.
Figure 1 includes graphs of the neural network training performance for prediction of saturates content. Figure 1 indicates the three types of data: training, testing, and validation. Different percentages can be used when splitting data from one set into three subsets (for training, for testing, and for validation). Typically, the training subset is the largest. One method of partitioning is to specify in the program code the start and end of the input sequence in the subset where it will fall. This way, however, the later samples are likely to differ significantly from the first samples. Therefore, in our case, the subset partitioning algorithm was used to randomize the individual elements of the subsets. The graph in Figure 1 shows that the achieved mean squared error was 0.0036299, which occurred at epoch 6, while the goal set at 1 × 10 6 can be seen as a horizontal line parallel to the abscissa. After epoch 6, the mean squared error started to increase for both testing and validation, implying that no further improvement in the model accuracy should be expected.
Figure 2 presents parity graphs of predicted versus measured (target) saturate contents of the investigated petroleum fluids for the three types of data: training, testing, and validation. The correlation coefficients of the three types of data were found to be R = 0.980 for training, R = 0.929 for testing, R = 0.877 for validation, and overall R = 0.959. The overall data comprised all subsets of data for training, testing, and validation.

3.2.2. Aromatic Content Prediction by ANN

Figure S2 exhibits the implementation of the neural network training process to develop the ANN aromatic content prediction model. The data in Figure S2 reveal that the maximum number of epochs allocated for training was 5000, the time for the ANN training lasted 37 s, the target mean squared error was set at 1 × 10 6 , and the attained mean squared error was 0.000182, while the training method used was that of Levenberg–Marquardt.
Figure 3 depicts graphs of the neural network training performance for prediction of aromatics content. Figure 3 refers to the three types of data: training, testing, and validation. The mean squared error was observed in epoch 5 and was 0.0028271, while the goal set at 1 × 10 6 can be seen as a horizontal line parallel to the abscissa. The data in Figure 3 show that during the training process of the ANN, a small minimum was reached at the first epoch. To determine whether this minimum was global or local, the system was set to run additional six epochs, during which the next minimum was sought, with a value lower than the previous one. In this way, we aimed to find the global minimum of the mean squared error function, which was reached at epoch five. As seen in Figure 3, six more epochs were performed to continue searching for a lower value than the minimum at epoch five. It is evident that no such minimum was found during those subsequent six epochs. The training curve clearly indicates that the training of the neural network continued until the end of the entire process, but no improvement was observed during validation and testing.
The graphs of agreement between measured (target) and predicted values attained with ANN for aromatics content are presented in Figure 4 for the three types of data: training, testing, and validation. The correlation coefficients of the three types of data were R = 0.944 for training, R = 0.697 for testing, R = 0.369 for validation, and overall R = 0.862.

3.2.3. Resin Content Prediction by ANN

Figure S3 illustrates the execution of the neural network training process to develop the ANN resin content prediction model. The data in Figure S3 summarize that the maximum number of epochs allocated for training was 5000, the time for the ANN training lasted 57 s, the target mean squared error was set at 1 × 10 6 , and the attained mean squared error was 1.57 × 10 5 ; the training method used was again that of Levenberg–Marquardt.
Figure 5 includes graphs of the neural network training performance for prediction of content of resins. All three types of data: training, testing, and validation are arranged in Figure 5. The mean squared error was found in epoch 11 with a value of 0.00048523, while the goal set at 1 × 10 6 can be seen as a horizontal line parallel to the abscissa. The data in Figure 5 show that the process followed the same sequence as in the modeling of the hydrocarbon groups *saturates* and *aromatics* (Figure 1 and Figure 3), but here, the minimum error was achieved at epoch 11. The system continued to train the ANN for an additional six epochs, searching for a minimum lower than the one already established. As can be seen, the training process here, as with the modeling of the other two hydrocarbon groups *saturates* and *aromatics* (Figure 1 and Figure 3), achieved a very small error in the order of 1 × 10−5.
The correlation plots of measured (target) and ANN-predicted resin content are shown in Figure 6 for the three data types: training, testing, and validation. The correlation coefficients of the three types of data were R = 0.993 for training, R = 0.886 for testing, R = 0.940 for validation, and overall R = 0.955.

3.2.4. Asphaltene Content Prediction by ANN

Figure S4 refers to the realization of the neural network training process to establish the ANN asphaltene content prediction model. The data in Figure S4 disclose that the maximum number of epochs allocated for training was 5000, the time for the ANN training proceeded for 12 s, the target mean squared error was set at 1 × 10 6 , and the attained mean squared error was 1.82 × 10 5 ; the training method used was again that of Levenberg–Marquardt.
Figure 7 shows graphs of the neural network training performance for prediction of asphaltenes content. All three types of data: training, testing, and validation are presented in Figure 7. The mean squared error was found in epoch 4 with a value of 0.00043376, while the goal set at 1 × 10 6 can be seen as a horizontal line parallel to the abscissa. The training, validation, and testing process of the ANN for modeling asphaltenes content based on sulfur content, density, flash point, and distillation characteristics is presented in Figure 7. It was difficult to identify a fundamental difference in the training process of the ANN with regard to modeling the content of the four hydrocarbon groups: *saturates*, *aromatics*, *resins*, and *asphaltenes*. The reason for this observation is the fact that the same algorithm was employed for training of the ANN modeling of the four SARA fraction contents. Here, as shown in Figure 1, Figure 3 and Figure 5, a continuous decrease in the mean squared error of the network’s training was observed within the studied range of epochs, while during the validation and testing processes, the minima were observed at different epochs. The minimum for asphaltene content prediction occurred at epoch 4.
The parity diagrams of measured (target) versus ANN-predicted asphaltene content are exhibited in Figure 8 for the three data types: training, testing, and validation. The correlation coefficients of the three types of data were R = 0.986 for training, R = 0.908 for testing, R = 0.901 for validation, and overall R = 0.964.

3.3. Regression Models to Fit and Predict Oil SARA Composition

The initial guess about the independent variables (petroleum properties) to be involved in the regression equations to fit the measured SARA fraction contents was made based on the results from the ICrA evaluation. These variables were density, sulfur content and the yields of fractions boiling up to 300, 400, and 500 °C. The remaining petroleum property investigated in this work—flash point—was also added to the independent variables to assess whether it could improve the fit and result in a better combination of independent variables. The flash point in combination with the other petroleum characteristics led to improvement of the accuracy of the data fit. This finding indicates that the preliminary statistical evaluation of the presence of the significant relation of a single variable to the target property does not always imply that this variable would not have effect on the target function when it acts in combination with the other independent variables. Using the CAS Maple 2024 Global Optimization Tool (method = diffevol), linear and non-linear equations were developed. The forms of linear and non-linear regression are presented as Equations (6) (fl), and (7) (fn), respectively:
f l x , y , z , t = a 1 x + a 2 y + a 3 z + a 4 t + a 5
f n x , y , z , t = a 1 x b 1 + a 2 y b 2 + a 3 z b 3 + a 4 t b 4 + a 5
where:
x = density at 15 °C, g/cm3;
y = sulfur content, wt.%;
z = yield of fraction boiling up to 500 °C, wt.%;
t = flash point, °C;
a1–a4; b1–b4 = regression coefficients.
Table 4 summarizes the values of the obtained regression coefficients for both linear and non-linear models.
Figure 9 shows the parity graphs of the measured and calculated contents of the four SARA fractions from Equations (6) and (7).
Figure 9 indicates that the dispersion of predicted values is visibly the lowest for the saturate content. The aromatic content prediction features with a significant scattering, the resin and asphaltene contents also display some points with a substantial deviation from the measured values.

3.4. Statistical Analysis of ANN and Regression SARA Fraction Content Prediction Models

Table 5 summarizes the values of the statistical parameters estimated by Equations (1)–(5) for the twelve SARA fraction content fitted models developed in this work. The data in Table 5 demonstrate the superiority of the ANN model over the regression ones for all statistical parameters. In both ANN and the regression models, the least accurate fitting was that of the aromatic fraction content. The accuracy of content fit of saturates, resins, and asphaltenes by the ANN models judged by the correlation coefficient for the ANN was almost the same (~0.96), while that of regression models varied over a wider range (between 0.86 and 0.93), confirming the superiority of the ANN SARA composition modeling technique for the fit.
It should be noted here that the ability to predict SARA fraction content using regres-sion and ANN techniques should only be evaluated based on the test data. For that reason, a comparison was made between the abilities of the ANN, linear, and non-linear regression models to accurately predict the SARA fraction contents, in which the testing data employed by the ANN modeling was used. Table 6 summarizes the statistical parameters for the testing data for the three different techniques.
The data in Table 6 exhibit the rank of classification of the models in terms of their accuracy of prediction as follows:
Saturates: non-linear regression > ANN > linear regression;
Aromatics: linear regression ≥ non-linear regression > ANN;
Resins: ANN > non-linear regression > linear regression;
Asphaltenes: ANN > linear regression > non-linear regression.
None of the techniques used in this study to model SARA composition demonstrated a better performance for all SARA fraction contents. Unfortunately, assessment of the adequacy of the models in this work could not be performed, due to the lack of information on the uncertainty of the SARA composition measurement. However, if the data reported by Yrranton [77] for SARA analysis uncertainty are used, it can be seen that none of the models were able to predict the saturate content with a measurement uncertainty of 5.2 wt.%, as referred to by him. Twice the standard deviation used as an indicator for the uncertainty of prediction [93] for the most accurate non-linear regression model is 10.2 wt.%. The uncertainty of SARA assays obtained based on the results reported by different laboratories is in the range 8.1–19.9 wt.% for aromatics; 8.4–25.7 wt.% for resins; and 0.7–5.6 wt.% for asphaltenes [77], suggesting that even for aromatics whose content is poorly predicted, it falls in the range of reproducibility. Various studies dealing with petroleum property prediction by both regression and ANN models have highlighted the better performance of the ANN models [66,67,94,95,96,97,98]. However, there are also researchers who have announced the better [99] or equal [100,101] performance of empirical correlations compared with ANN models. The reason for the better performance of ANN models may lie not only in the architecture of the ANN but also in the bigger number of input variables in comparison with regression models. Thus, one may suggest that the accuracy of petroleum property prediction can depend on the database used, its size, and the number of input variables. The next study will be focused on the effect of the size of the database and the number of input petroleum properties on the accuracy of SARA fraction content prediction by ANN and in comparison with regression models.
Our experience with the measurement of the SARA composition of various oils performed in different labs, even during round-robin testing, has shown that sometimes the results are terrible. For that reason, it is very difficult to find a reliable database to use for the purposes of modeling. Employing data from outside of a reliable database can lead to reporting a very poor prediction, probably not only a result of an inadequate model but also a result of an inappropriate database. Some relations found in one database, which should be considered generic, such as, for example, the relation of density to the saturate content, as reported in [91], can be completely skewed when applied to another database when verification is carried out. This is very well illustrated with the data from Figure 10.
In Figure 10, the model predicting petroleum saturate content is taken from [10] (Equation (8)) and is based on 308 crude oil data analyses from all over the world, acquired using five distinct procedures to measure SARA composition.
S a t u r a t e   c o n t e n t = 100 100 0.2748 + 5.198 e 4.787 SG 239
It can be seen that this model well describes a good deal of the data in Figure 10. However, it is also evident that part of the data can be portrayed by a similar parallel line with a higher saturate content of about 15 wt.%. What is interesting here is that the data from the blue triangle (this work) obey Equation (8) to a greater extent than the data from the black square (from reference [102]). The black square data (66 petroleum fluids) were taken from the Environment Canada Laboratory [102]. The same source of data was cited in [85], from where the properties of the 94 petroleum fluids studied in this work were taken. The SARA composition from both sources [85] and [102] should therefore be considered to have been measured using the same procedure. Nevertheless, the majority of the data from [102], as evident from Figure 10, would lie on a line parallel to that of the model based on density, with about 15 wt.% more saturates. The higher saturate content at the same density would mean a different distribution of the aromatic structure compounds (aromatics, resins, and asphaltenes) with predominance of the heavier resins and asphaltenes, assuming that the saturates had the same density. Similar phenomena were reported in [103], where straight run and hydrocracked vacuum residues were studied. At the same density, the hydrocracked vacuum residues exhibited between 5 and 10 wt.% higher content of saturates than the straight run vacuum residues. Meanwhile, the hydrocracked vacuum residues demonstrated higher Conradson carbon content at the same density [104], meaning a higher content of resins and asphaltenes, which have higher Conradson carbon content than the aromatic fraction [105]. Thus, it may be concluded that the petroleum oils which had different saturate content at the same density could have had distinct distribution of the fractions containing aromatic carbon (aromatics, resins, and asphaltenes). Based on this reasoning, it could be expected that models developed using data that obey Equation (8) to a greater extent, as is the case studied in this research, may fail to predict the SARA composition of the data indicated as black squares in Figure 10, because of the distinct underlying chemistry of the petroleum oils from both databases.
The ANN and regression models developed in this work were tested against the data from ref. [102], as shown in Table S6. Table 7 summarizes the values of the statistical parameters estimated via Equations (1)–(5) for the twelve SARA fraction content models tested with the data of 66 petroleum oils as shown in Table S6.
The data in Table 7 witness that predictions of SARA fraction content for the data set of 66 petroleum oils in Table S6 were worse than those related to the testing data of 19 petroleum oils whose properties are shown in Tables S2–S5. However, if one takes a look at the data in Table 5, Table 6 and Table 7 related to the statistical parameters of asphaltenes content calculated via linear regression (Equation (6), along with the regression coefficients from Table 4), one can see a similar not very high scattering implying that the density and sulfur content seem to be good descriptors of petroleum asphaltene content. This finding is in line with the significance of sulfur and density in petroleum characterization and classification [105,106]. The prognosis of the other three fractions: saturates, aromatics, and resins could not be classified as satisfactory. A different underlying chemistry of the 66 petroleum fluids (Table S6) relative to that of the 94 oils (Table S1) used to develop the models in this work could be the explanation for the observed discrepancy between measured and predicted saturates, aromatics, and resins content. It seems that improvements in modeling will be necessary. These may include sample separation based on a certain criterion and involvement of extra oil properties that are better descriptors of these oil SARA fractions.
The ICrA evaluation of the data processed in this study indicated that strong relations between the investigated properties did not exist (0.05 ≥ μ ≥ 0.95; 0.05 ≥ υ ≥ 0.95), while the ANN and the regression techniques exhibited the presence of partially close properties. The fact that the prediction of SARA composition was not distinguished with exceptionally high accuracy is in line with the absence of strongly expressed dependence, as was shown by the ICrA evaluation. Therefore, one may expect that the involvement of additional petroleum properties in the SARA composition model may improve the accuracy of prediction by the ANN and the regression techniques.

4. Conclusions

Three data mining techniques: intercriteria analysis, regression, and artificial neural networks were employed to study a data base of 94 petroleum fluids containing information on SARA composition, density, sulfur content, flash point, and simulated distillation characteristics. The density, sulfur content, and simulated distillation characteristics were found to have statistically meaningful relations to the saturate, resin, and aphaltene fraction contents. The aromatic fraction content, unlike the other SARA fractions, did not show statistically meaningful relations to the same petroleum properties. This is in line with the lower accuracy of prediction of aromatics attained with both ANN and regression models. The ANN models were superior over the regression models in the process of data fitting. However, with the testing data, it was found that the ANN models best predicted the resins and asphaltenes content, while the saturates and aromatics were best predicted by the non-linear regression model. The ANN model forecasted the content of resins and asphaltenes within the reproducibility of the SARA composition measurement. Aromatics that were characterized with a very low reproducibility of measurement were predicted within these limits by the regression models. Despite the best prediction of saturate content by the non-linear regression model, its accuracy was lower than the uncertainty of its measurement. It was found that the petroleum properties of density, sulfur content, yield of fraction boiling up to 500 °C, and flash point affected the accuracy of SARA fraction content prediction for the studied 94 oils.
When testing the models with an additional 66 petroleum fluids, the prognosis of saturate, aromatic, and resin fraction contents worsened, while that for asphaltenes content estimated via linear regression from density and sulfur content maintained an acceptable fair dispersion from the measured value for all data examined in this work. Supplementary research where extra petroleum properties are included as independent input variables in the SARA composition models, possibly with separation of the data by a selected criterion, is needed to improve the accuracy of prediction.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/pr12081755/s1, Figure S1: Neural network training for prediction of saturates content; Figure S2: Neural network training for prediction of aromatics content; Figure S3: Neural network training for prediction of resins content; Figure S4. Neural network training for prediction of asphaltenes content; Table S1: Petroleum density, flash point, sulfur content, SARA composition data, and high-temperature simulated distillation data from [85]; Table S2: Petroleum density, flash point, sulfur content, high-temperature simulated distillation, and saturate fraction content data separated into training (not marked), validation (marked in green color), and testing (marked with brown color) sets; Table S3: Petroleum density, flash point, sulfur content, high-temperature simulated distillation, and aromatic fraction content data separated into training (not marked), validation (marked in green color), and testing (marked with brown color) sets; Table S4: Petroleum density, flash point, sulfur content, high-temperature simulated distillation, and resin fraction content data separated into training (not marked), validation (marked in green color), and testing (marked with brown color) sets; Table S5: Petroleum density, flash point, sulfur content, high-temperature simulated distillation, and asphaltene fraction content data separated into training (not marked), validation (marked in green color), and testing (marked with brown color) sets; Table S6: Petroleum density, flash point, sulfur content, SARA composition data, and high-temperature simulated distillation data from [102].

Author Contributions

Conceptualization, D.S. and I.S.; methodology, R.D.; software, S.S. and D.D.S.; validation, E.S., S.N. and F.v.d.B.; formal analysis, K.A.; investigation, I.K.; resources, D.Y.; data curation, S.R.; writing—original draft preparation, D.S., I.S. and S.S.; writing—review and editing, D.S., I.S., S.S. and S.N.; visualization, S.S.; supervision, D.S.; project administration, D.S. All authors have read and agreed to the published version of the manuscript.

Funding

The author Svetoslav Nenov expresses thanks for the financial support of European Union–NextGenerationEU, through the National Recovery and Resilience Plan of the Republic of Bulgaria, project № BG-RRP-2.004-0002-C01, “BiOrgaMCT”.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

Authors Ivelina Shiskova, Dicho Stratiev, Rosen Dinkov, Iliyan Kolev were employed by LUKOIL Neftohim Burgas. Author Frans van den Berg was employed by Black Oil Solutions. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

  1. Schobert, H. Chapter 11: Composition, classification, and properties of petroleum. In Chemistry of Fossil Fuels and Biofuels; Cambridge University Press: Cambridge, UK, 2013. [Google Scholar]
  2. Viswanathan, B. Chapter 2: Petroleum. In Energy Sources: Fundamentals of Chemical Conversion Processes and Applications; Elsevier: Amserdam, The Netherlands, 2017. [Google Scholar]
  3. Redelius, P.; Soenen, H. Relation between bitumen chemistry and performance. Fuel 2015, 140, 34–43. [Google Scholar] [CrossRef]
  4. Vetere, A.; Schrader, W. Mass Spectrometric Coverage of Complex Mixtures: Exploring the Carbon Space of Crude Oil. ChemistrySelect 2017, 2, 849–853. [Google Scholar] [CrossRef]
  5. Farmani, Z.; Schrader, W. A Detailed Look at the Saturate Fractions of Different Crude Oils Using Direct Analysis by Ultrahigh Resolution Mass Spectrometry (UHRMS). Energies 2019, 12, 3455. [Google Scholar] [CrossRef]
  6. Panda, S.K.; Alsewdan, D.A.; Alshammari, M.M.; Qasim Saleem, Q.; Kearney, D.J. Aromatic-selective size exclusion chromatography (ASSEC): How quantitative is it for petroleum aromatics? Fuel 2023, 346, 128361. [Google Scholar] [CrossRef]
  7. Bisht, H.; Reddy, M.; Malvanker, M.; Patil, R.C.; Gupta, A.; Hazarika, B.; Das, A.K. Efficient and Quick Method for Saturates, Aromatics, Resins, and Asphaltenes Analysis of Whole Crude Oil by Thin-Layer Chromatography−Flame Ionization Detector. Energy Fuels 2013, 27, 3006–3013. [Google Scholar] [CrossRef]
  8. Zhou, X.; Shi, Q.; Zhang, Y.; Zhao, S.; Zhang, R.; Chung, K.H.; Xu, C. Analysis of Saturated Hydrocarbons by Redox Reaction with Negative-Ion Electrospray Fourier Transform Ion Cyclotron Resonance Mass Spectrometry. Anal. Chem. 2012, 84, 3192–3199. [Google Scholar] [CrossRef]
  9. Schuler, B.; Fatayer, S.; Meyer, G.; Rogel, E.; Moir, M.; Zhang, Y.; Harper, M.R.; Pomerantz, A.E.; Bake, K.D.; Witt, M.; et al. Heavy oil based mixtures of different origins and treatments studied by AFM. Energy Fuels 2017, 31, 6856–6861. [Google Scholar] [CrossRef]
  10. Shishkova, I.; Stratiev, D.; Kolev, I.V.; Nenov, S.; Nedanovski, D.; Atanassov, K.; Ivanov, V.; Ribagin, S. Challenges in Petroleum Characterization—A Review. Energies 2022, 15, 7765. [Google Scholar] [CrossRef]
  11. Bissada, K.K.; Tan, J.; Szymczyk, E.; Darnell, M.; Mei, M.; Zhou, J. Group-type characterization of crude oil and bitumen. Part I: Enhanced separation and quantification of saturates, aromatics, resins and asphaltenes (SARA). Org. Geochem. 2016, 95, 21–28. [Google Scholar] [CrossRef]
  12. Cho, Y.; Na, J.G.; Nho, N.S.; Kim, S.H.; Kim, S. Application of saturates, aromatics, resins, and asphaltenes crude oil fractionation for detailed chemical characterization of heavy crude oils by Fourier transform ion cyclotron resonance mass spectrometry equipped with atmospheric pressure photoionization. Energy Fuels 2012, 26, 2558–2565. [Google Scholar]
  13. Shi, Q.; Hou, D.; Chung, K.H.; Xu, C.; Zhao, S.; Zhang, Y. Characterization of heteroatom compounds in a crude oil and its saturates, aromatics, resins, and asphaltenes (SARA) and non-basic nitrogen fractions analyzed by negative-ion electrospray ionization Fourier transform ion cyclotron resonance mass spectrometry. Energy Fuels 2010, 24, 2545–2553. [Google Scholar]
  14. Liu, P.; Shi, Q.; Chung, K.H.; Zhang, Y.; Pan, N.; Zhao, S.; Xu, C. Molecular characterization of sulfur compounds in Venezuela crude oil and its SARA fractions by electrospray ionization Fourier transform ion cyclotron resonance mass spectrometry. Energy Fuels 2010, 24, 5089–5096. [Google Scholar] [CrossRef]
  15. Klein, G.C.; Angstrom, A.; Rodgers, R.P.; Marshall, A.G. Use of saturates/aromatics/resins/asphaltenes (SARA) fractionation to determine matrix effects in crude oil analysis by electrospray ionization Fourier transform, ion cyclotron resonance mass spectrometry. Energy Fuel 2006, 20, 668–672. [Google Scholar] [CrossRef]
  16. Sakib, N.; Amit Bhasin, A. Measuring polarity-based distributions (SARA) of bitumen using simplified chromatographic techniques. Int. J. Pavement Eng. 2019, 20, 1371–1384. [Google Scholar] [CrossRef]
  17. Schabron, J.F.; Gardner, G.W.; Hart, J.K.; Niss, N.D.; Miyake, G.; Netzel, D.A. The characterization of petroleum residua. In Report to Mobil Research and Development Corp. and DOE; DOE Report DOE/MC/11076-3539; Western Research Institute: Laramie, WY, USA, 1993. [Google Scholar]
  18. Karevan, A.; Zirrahi, M.; Hassanzadeh, H. Standardized High-Performance Liquid Chromatography to Replace Conventional Methods for Determination of Saturate, Aromatic, Resin, and Asphaltene (SARA) Fractions. ACS Omega 2022, 7, 18897–18903. [Google Scholar] [CrossRef] [PubMed]
  19. Woods, J.; Kung, J.; Kingston, D.; Kotlyar, L.; Sparks, B.; McCracken, T. Canadian Crudes: A Comparative Study of SARA Fractions from a Modified HPLC Separation Technique. Oil Gas Sci. Technol. Rev. IFP 2008, 63, 151–163. [Google Scholar] [CrossRef]
  20. Islas-Flores, C.A.; Buenrostro-Gonzalez, E.; Lira-Galeana, C. Comparisons between Open Column Chromatography and HPLC SARA Fractionations in Petroleum. Energy Fuels 2005, 19, 2080–2088. [Google Scholar] [CrossRef]
  21. Fan, T.; Wang, J.; Buckley, J.S. Evaluation of crude oil by SARA analysis. In Proceedings of the SPE Improved Oil Recovery Conference, Tulsa, OK, USA, 13–17 April 2002. p. SPE 75228. [Google Scholar]
  22. Fan, T.; Buckley, J.S. Rapid and Accurate SARA Analysis of Medium Gravity Crude Oils. Energy Fuels 2002, 16, 1571–1575. [Google Scholar] [CrossRef]
  23. Kök, M.V.; Varfolomeev, M.A.; Nurgaliev, D.K. Determination of SARA fractions of crude oils by NMR technique. J. Pet. Sci. Eng. 2019, 179, 1–6. [Google Scholar]
  24. Kharrat, A.M.; Zacharia, J.; Cherian, V.J.; Anyatonwu, A. Issues with Comparing SARA Methodologies. Energy Fuels 2007, 21, 3618–3621. [Google Scholar] [CrossRef]
  25. Santos, J.M.; Vetere, A.; Wisniewski, A.; Eberlin, M.N.; Schrader, W. Modified SARA Method to Unravel the Complexity of Resin Fraction(s) in Crude Oil. Energy Fuels 2020, 34, 16006–16013. [Google Scholar] [CrossRef]
  26. Rezaee, S.; Doherty, R.; Tavakkoli, M.; Vargas, F.M. Improved Chromatographic Technique for Crude Oil Maltenes Fractionation. Energy Fuels 2019, 33, 708–713. [Google Scholar] [CrossRef]
  27. Savonina, E.Y.; Panyukova, D.I. State of the Art and Prospects for the Development of Methods for Determining the Group Hydrocarbon Composition (SARA Composition) of Crude Oil and Petroleum Products. Russ. J. Appl. Chem. 2023, 96, 503–524. [Google Scholar] [CrossRef]
  28. Panyukova, D.I.; Savonina, E.Y.; Ossipov, K. Determination the Hydrocarbon Group-Type Composition of Petroleum Feedstocks and Products through Foreign Experience. J. Anal. Chem. 2024, 79, 366–378. [Google Scholar] [CrossRef]
  29. Kheirollahi, S.; BinDahbag, M.; Bagherzadeh, H.; Zahra Abbasi, Z.; Hassanzadeh, H. Improved determination of saturate, aromatic, resin, and asphaltene (SARA) fractions using automated high-performance liquid chromatography approach. Fuel 2024, 371, 131884. [Google Scholar] [CrossRef]
  30. Marques, J.; Maget, S.; Verstraete, M.J.J. Improvement of ebullated-bed effluent stability at high conversion operation. Energy Fuels 2011, 25, 3867–3874. [Google Scholar] [CrossRef]
  31. Jumina, J.; Kurniawan, Y.S.K.; Siswanta, D.; Purwono, B.; Zulkarnain, A.K.; Winarno, A.; Waluyo, J.; Ahmad, J.S.M. The Origin, Physicochemical Properties, and Removal Technology of Metallic Porphyrins from Crude Oils. Indones. J. Chem. 2021, 21, 4. [Google Scholar] [CrossRef]
  32. Karimi, A.; Qian, K.; Olmstead, W.N.; Freund, H.; Yung, C.; Gray, M.R. Quantitative Evidence for Bridged Structures in Asphaltenes by Thin Film Pyrolysis. Energy Fuels 2011, 25, 3581–3589. [Google Scholar] [CrossRef]
  33. López, L.; Mónaco, S.L. Vanadium, Nickel and Sulfur in Crude Oils and Source Rocks and Their Relationship with Biomarkers: Implications for the Origin of Crude Oils in Venezuelan Basins. Org. Geochem. 2017, 104, 53–68. [Google Scholar] [CrossRef]
  34. Kumar, A.T.; Svrcek, W.Y.; Yarranton, H.W.; Taylor, S.D.; Merino-Garcia, D.; Rahimi, P.M. Measurement and Modeling of Asphaltene Precipitation from Crude Oil Blends. Energy Fuels 2009, 23, 3971–3980. [Google Scholar]
  35. Ashoori, S.; Sharifi, M.; Masoumi, M.; Salehi, M.M. The Relationship Between SARA Fractions and Crude Oil Stability. Egypt. J. Pet. 2017, 26, 209–213. [Google Scholar] [CrossRef]
  36. Sulaimon, A.A.; Habineswaran, A.; Rajan, L.; Qasim, A.; Christiana, N.P.; Murungi, P.I. Developing New Correlations for Asphaltene Deposition Involving SARA Fractions and Colloidal Instability Index. J. Pet. Sci. Eng. 2023, 220, 111143. [Google Scholar] [CrossRef]
  37. Guzmán, R.; Ancheyta, J.; Trejo, F.; Rodríguez, S. Methods for Determining Asphaltene Stability in Crude Oils. Fuel 2017, 188, 530–543. [Google Scholar] [CrossRef]
  38. Abutaqiya, M.I.L.; Sisco, C.J.; Khemka, Y.; Safa, M.A.; Ghloum, E.F.; Rashed, A.M.; Gharbi, R.; Santhanagopalan, S.; Al-Qahtani, M.; Al-Kandari, E.; et al. Accurate Modeling of Asphaltene Onset Pressure in Crude Oils Under Gas Injection Using Peng−Robinson Equation of State. Energy Fuels 2020, 34, 4055–4070. [Google Scholar] [CrossRef]
  39. Ting, P.D.; Hirasaki, G.J.; Chapman, W.G. Modeling of Asphaltene Phase Behavior with the SAFT Equation of State. Pet. Sci. Technol. 2003, 21, 647–661. [Google Scholar] [CrossRef]
  40. Gonzalez, D.L.; Hirasaki, G.J.; Creek, J.; Chapman, W.G. Modeling of Asphaltene Precipitation Due to Changes in Composition Using the Perturbed Chain Statistical Associating Fluid Theory Equation of State. Energy Fuels 2007, 21, 1231–1242. [Google Scholar] [CrossRef]
  41. Punnapala, S.; Vargas, F.M. Revisiting the PC-SAFT Characterization Procedure for an Improved Asphaltene Precipitation Prediction. Fuel 2013, 108, 417–429. [Google Scholar] [CrossRef]
  42. Tavakkoli, M.; Chen, A.; Vargas, F.M. Rethinking the Modeling Approach for Asphaltene Precipitation Using the PC-SAFT Equation of State. Fluid Phase Equilib. 2016, 416, 120–129. [Google Scholar] [CrossRef]
  43. Wang, J.; Buckley, J.S. Asphaltene Stability in Crude Oil and Aromatic Solvents: The Influence of Oil Composition. Energy Fuels 2003, 17, 1445–1451. [Google Scholar] [CrossRef]
  44. Evdokimov, I.N. The Importance of Asphaltene Content in Petroleum III—New Criteria for Prediction of Incompatibility in Crude Oil Blends. Petroleum Sci. Technol. 2010, 28, 1351–1357. [Google Scholar] [CrossRef]
  45. Ali, S.I.; Lalji, S.M.; Awan, Z.; Qasim, M.; Alshahrani, T.; Khan, F.; Ullah, S.; Ashraf, A. Prediction of Asphaltene Stability in Crude Oils Using Machine Learning Algorithms. Chemom. Intell. Lab. Syst. 2023, 235, 104784. [Google Scholar] [CrossRef]
  46. Sinnathambi, C.M.; Nor, N.M. Relationship between SARA Fractions and Crude Oil Fouling. J. Appl. Sci. 2012, 12, 2479–2483. [Google Scholar] [CrossRef]
  47. Asomaning, S.; Watkinson, A.P. Petroleum Stability and Heteroatom Species Effects in Fouling of Heat Exchangers by Asphaltenes. Heat Tran. Eng. 2000, 21, 10–16. [Google Scholar]
  48. Hong, E.; Watkinson, P. A Study of Asphaltene Solubility and Precipitation. Fuel 2004, 83, 1881–1887. [Google Scholar]
  49. Saleh, Z.S.; Sheikholeslami, R.; Watkinson, A.P. Blending Effects on Four Crude Oils. In Proceedings of the 6th International Conference on Heat Exchanger Fouling and Cleaning—Challenges and Opportunities, Kloster Irsee, Germany, 5–10 June 2005. ECI Symposium Series, Volume RP2. [Google Scholar]
  50. Hong, E.; Watkinson, P. Precipitation and Fouling in Heavy Oil–Diluent Blends. Heat Transfer Eng. 2009, 30, 786–793. [Google Scholar]
  51. van den Berg, F.G.A.; Kapusta, S.D.; Ooms, A.C.; Smith, A.J. Fouling and Compatibility of Crudes as Basis for a New Crude Selection Strategy. Pet. Sci. Technol. 2003, 21, 557–568. [Google Scholar] [CrossRef]
  52. Reyes-Gonzalez, D.; Ramirez-Jaramillo, E.; Manero, O.; Lira-Galeana, C.; del Rio, J.M. Estimation of the SARA Composition of Crude Oils from Bubble Point Pressure Data. Energy Fuels 2016, 30, 6913–6922. [Google Scholar] [CrossRef]
  53. Graham, B.F.; May, E.F.; Trengove, R.D. Emulsion Inhibiting Components in Crude Oils. Energy Fuels 2008, 22, 1093–1099. [Google Scholar] [CrossRef]
  54. Alves, C.A.; Romero Yanes, J.F.; Feitosa, F.X.; de Sant’Ana, H.B. Influence of Asphaltenes and Resins on Water/Model Oil Interfacial Tension and Emulsion Behavior: Comparison of Extracted Fractions from Crude Oils with Different Asphaltene Stability. J. Pet. Sci. Eng. 2021, 208, 109268. [Google Scholar] [CrossRef]
  55. Saputra, I.W.R.; Schechter, D.S. SARA-Based Correlation to Describe the Effect of Polar/Nonpolar Interaction, Salinity, and Temperature for Interfacial Tension of Low-Asphaltene Crude Oils Characteristic of Unconventional Shale Reservoirs. SPE J. 2021, 26, 3681–3693. [Google Scholar] [CrossRef]
  56. Weia, B.; Zou, P.; Shang, J.; Gao, K.; Li, Y.; Sun, L.; Pu, W. Integrative Determination of the Interactions Between SARA Fractions of an Extra-Heavy Crude Oil During Combustion. Fuel 2018, 234, 850–857. [Google Scholar] [CrossRef]
  57. Carbognani, L.; Gonzalez, M.F.; Pereira-Almao, P. Characterization of Athabasca Vacuum Residue and Its Visbroken Products: Stability and Fast Hydrocarbon Group-Type Distributions. Energy Fuels 2007, 21, 1631–1639. [Google Scholar] [CrossRef]
  58. Zhang, N.; Zhao, S.; Sun, X.; Xu, Z.; Xu, C. Storage Stability of the Visbreaking Product from Venezuela Heavy Oil. Energy Fuels 2010, 24, 3970–3976. [Google Scholar] [CrossRef]
  59. Behar, F.; Lorant, F.; Mazeas, L. Elaboration of a New Compositional Kinetic Schema for Oil Cracking. Org. Geochem. 2008, 39, 764–782. [Google Scholar] [CrossRef]
  60. Guo, A.; Zhang, X.; Wang, Z. Simulated Delayed Coking Characteristics of Petroleum Residues and Fractions by Thermogravimetry. Fuel Process. Technol. 2008, 89, 643–650. [Google Scholar] [CrossRef]
  61. Schucker, R.C. Thermogravimetric Determination of the Coking Kinetics of Arab Heavy Vacuum Residuum. Ind. Eng. Chem. Process Des. Dev. 1983, 22, 615–619. [Google Scholar] [CrossRef]
  62. Xu, C.; Gao, J.; Zhao, S.; Lin, S. Correlation between Feedstock SARA Components and FCC Product Yields. Fuel 2005, 84, 669–674. [Google Scholar] [CrossRef]
  63. Félix, G.; Ancheyta, J. Comparison of Hydrocracking Kinetic Models Based on SARA Fractions Obtained in Slurry-Phase Reactor. Fuel 2019, 241, 495–505. [Google Scholar] [CrossRef]
  64. Felix, G.G.; Tirado, A.; Al-Muntaser, A.; Kwofie, M.M.; Varfolomeev, M.A.; Yuan, C.; Ancheyta, J. SARA-Based Kinetic Model for Non-Catalytic Aquathermolysis of Heavy Crude Oil. J. Pet. Sci. Eng. 2022, 216, 110845. [Google Scholar] [CrossRef]
  65. Goel, P.; Saurabh, K.; Patil-Shinde, V.; Tambe, S.S. Prediction of API Values of Crude Oils by Use of Saturates/Aromatics/Resins/Asphaltenes Analysis: Computational-Intelligence-Based Models. SPE J. 2016, 22, 817–853. [Google Scholar] [CrossRef]
  66. Stratiev, D.; Shishkova, I.; Dinkov, R.; Nenov, S.; Sotirov, S.; Sotirova, E.; Kolev, I.; Ivanov, V.; Ribagin, S.; Atanassov, K.; et al. Prediction of Petroleum Viscosity from Molecular Weight and Density. Fuel 2023, 331, 125679. [Google Scholar] [CrossRef]
  67. Quintana, G.G.; Ramos, R.R.; Martínez, J.J.D.A. Prediction of the Viscosity of Diluent—Crude Oil Blends: A Generalized Mixture Model Based on SARA and PIANO Fractions. Petroleum Sci. Technol. 2022, 40, 337–350. [Google Scholar] [CrossRef]
  68. Chamkalani, A. Correlations Between SARA Fractions, Density and RI to Investigate the Stability of Asphaltene. Int. Sch. Res. Not. 2012, 2012, 219276. [Google Scholar] [CrossRef]
  69. Tatar, A.; Shokrollahi, A.; Halali, M.A.; Azari, V.; Safari, H. A Hybrid Intelligent Computational Scheme for Determination of Refractive Index of Crude Oil Using SARA Fraction Analysis. Can. J. Chem. Eng. 2015, 93, 1547–1555. [Google Scholar] [CrossRef]
  70. Gholami, A.; Ansari, H.R.; Hosseini, S. Prediction of Crude Oil Refractive Index Through Optimized Support Vector Regression: A Competition Between Optimization Techniques. J. Pet. Explor. Prod. Technol. 2017, 7, 195–204. [Google Scholar] [CrossRef]
  71. Liu, W.; Ye, H.; Dong, H. Molecular Reconstruction of Crude Oil and Simulation of Distillation Processes: An Approach Based on Predefined Molecules. Chem. Eng. Sci. 2024, 295, 120141. [Google Scholar] [CrossRef]
  72. Melendez, L.; Lache, A.; Ruiz, O.; Pacho, Z.; Ospino, E.M. Prediction of the SARA Analysis of Colombian Crude Oils Using ATR–FTIR Spectroscopy and Chemometric Methods. J. Pet. Sci. Eng. 2012, 90–91, 56–60. [Google Scholar] [CrossRef]
  73. Sanchez-Minero, F.; Ancheyta, J.; Silva-Oliver, G.; Flores-Valle, S. Predicting SARA Composition of Crude Oil by Means of NMR. Fuel 2013, 110, 318–321. [Google Scholar] [CrossRef]
  74. Mohammadi, M.; Khorrami, M.K.; Vatani, A.; Ghasemzadeh, H.; Vatanparast, H.; Bahramian, A.; Fallah, A. Genetic Algorithm Based Support Vector Machine Regression for Prediction of SARA Analysis in Crude Oil Samples Using ATR-FTIR Spectroscopy. Spectrochim. Acta Part A Mol. Biomol. Spectrosc. 2021, 245, 118945. [Google Scholar] [CrossRef] [PubMed]
  75. Florez, M.A.; Guerrero, J.E.; Cabanzo, R.; Mejía-Ospino, E. SARA Analysis and Conradson Carbon Residue Prediction of Colombian Crude Oils Using PLSR and Raman Spectroscopy. J. Pet. Sci. Eng. 2017, 156, 966–970. [Google Scholar] [CrossRef]
  76. Pantoja, P.A.; López-Gejo, J.; Le Roux, G.A.C.; Quina, F.H.; Nascimento, C.A.O. Prediction of Crude Oil Properties and Chemical Composition by Means of Steady-State and Time-Resolved Fluorescence. Energy Fuels 2011, 25, 3598–3604. [Google Scholar] [CrossRef]
  77. Yarranton, H. Prediction of Crude Oil Saturate Content from a SimDist Assay. Energy Fuels 2022, 36, 8809–8817. [Google Scholar] [CrossRef]
  78. Kulkarni, A.D.; Khurpade, P.D.; Nandi, S. Estimation of SARA Composition of Crudes Purely from Density and Viscosity using Machine Learning Based Models. Petroleum in press. 2024. [Google Scholar] [CrossRef]
  79. ASTMD-4124; Standard Test Method for Separation of Asphalt into Four Fractions. ASTM International: West Conshohocken, PA, USA, 2018.
  80. ASTMD-3279; Standard Test Method for n-Heptane Insolubles. ASTM International: West Conshohocken, PA, USA, 2019.
  81. ASTMD-2007; Standard Test Method for Characteristic Groups in Rubber Extender and Processing Oils and Other Petroleum-Derived Oils by the Clay-Gel Absorption Chromatographic Method. ASTM International: West Conshohocken, PA, USA, 2019.
  82. Jokuty, P.; Whiticar, S.; Fingas, M.; Meyer, E.; Knobel, C. Hydrocarbon Groups and Their Relationships to Oil Properties and Behavior. In Proceedings of the 18th Arctic and Marine Oil Spill Program Technical Seminar (1995), Edmonton, AB, Canada, 14–16 June 1995. [Google Scholar]
  83. Nabzar, L.; Aguilera, M.E. The Colloidal Approach. A Promising Route for Asphaltene Deposition Modelling. Oil Gas Sci. Technol. Rev. IFP 2008, 63, 21–35. [Google Scholar] [CrossRef]
  84. Anwar, A.A.; Al-Jawad, M.S.; Ali, A.A. Asphaltene Stability of Some Iraqi Dead Crude Oils. J. Eng. 2019, 25, 53–67. [Google Scholar] [CrossRef]
  85. Fingas, M. Handbook of Oil Spill Science and Technology; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 2015. [Google Scholar]
  86. Mavrov, D. Software for InterCriteria Analysis: Implementation of the Main Algorithm. Notes Intuit. Fuzzy Sets 2015, 21, 77–86. [Google Scholar]
  87. Mavrov, D.; Radeva, I.; Atanassov, K.; Doukovska, L.; Kalaykov, I. InterCriteria Software Design: Graphic Interpretation within the Intuitionistic Fuzzy Triangle. In Proceedings of the Fifth International Symposium Business Modeling Software Design, Milan, Italy, 6–8 July 2015; pp. 279–283. [Google Scholar]
  88. Mavrov, D. Software for Intercriteria Analysis: Working with the Results. Ann. Informatics Sect. Union Sci. Bulg. 2015, 8, 37–44. [Google Scholar]
  89. Ikonomov, N.; Vassilev, P.; Roeva, O. ICrAData—Software for InterCriteria Analysis. Int. J. Bioautomation 2018, 22, 1–10. [Google Scholar] [CrossRef]
  90. Stratiev, D.; Shishkova, I.; Dinkov, R.; Kolev, I.; Argirov, G.; Ivanov, V.; Ribagin, S.; Atanassova, V.; Atanassov, K.; Stratiev, D.D.; et al. Intercriteria Analysis to Diagnose the Reasons for Increased Fouling in a Commercial Ebullated Bed Vacuum Residue Hydrocracker. ACS Omega 2022, 7, 30462–30476. [Google Scholar] [CrossRef]
  91. Stratiev, D.; Nenov, S.; Shishkova, I.; Georgiev, B.; Argirov, G.; Dinkov, R.; Yordanov, D.; Atanassova, V.; Vassilev, P.; Atanassov, K. Commercial investigation of the ebullated bed vacuum residue hydrocracking in the conversion range 55–93%. ACS Omega 2020, 5, 33290. [Google Scholar] [CrossRef]
  92. Buckley, J.S.; Morrow, N.R. Wettability and Imbibition: Microscopic Distribution of Wetting and Its Consequences at the Core and Field Scales; Final Report; New Mexico Petroleum Recovery Research Center: Socorro, NM, USA, 2003. [Google Scholar]
  93. Espinosa-Pena, M.; Yolanda Figueroa-Gomez, Y.; Jimenez-Cruz, F. Simulated Distillation Yield Curves in Heavy Crude Oils: A Comparison of Precision between ASTM D-5307 and ASTM D-2892 Physical Distillation. Energy Fuels 2004, 18, 1832–1840. [Google Scholar] [CrossRef]
  94. Stratiev, D.; Nenov, S.; Shishkova, I.; Sotirov, S.; Sotirova, E.; Dinkov, R.; Yordanov, D.; Pilev, D.; Atanassov, K.; Vasilev, S.; et al. Prediction of Viscosity of Blends of Heavy Oils with Diluents by Empirical Correlations and Artificial Neural Network. Ind. Eng. Chem. Res. 2023, 62, 21449–21463. [Google Scholar] [CrossRef]
  95. Bahonar, E.; Chahardowli, M.; Ghalenoei, Y.; Simjoo, M. New correlations to predict oil viscosity using data mining techniques. J. Pet. Sci. Eng. 2022, 208, 109736. [Google Scholar] [CrossRef]
  96. Hadavimoghaddam, F.; Ostadhassan, M.; Heidaryan, E.; Sadri, M.A.; Chapanova, I.; Popov, E.; Cheremisin, A.; Rafieepour, S. Prediction of Dead Oil Viscosity: Machine Learning vs. Classical Correlations. Energies 2021, 14, 930. [Google Scholar] [CrossRef]
  97. Bahaloo, S.; Mehrizadeh, M.; Najafi-Marghmaleki, A. Review of application of artificial intelligence techniques in petroleum operations. Pet. Res. 2023, 8, 167e182. [Google Scholar] [CrossRef]
  98. Alkinani, H.H.; Al-Hameedi, A.T.T.; Dunn-Norman, S.; Ralph, E.; Flori, R.E. Applications of Artificial Neural Networks in the Petroleum Industry: A Review. In Proceedings of the SPE Middle East Oil and Gas Show and Conference, Manama, Bahrain, 18–21 March 2019. p. SPE-195072-MS. [Google Scholar]
  99. Sinha, U.; Dindoruk, B.; Soliman, M. Machine learning augmented dead oil viscosity model for all oil types. J. Pet. Sci. Eng. 2020, 195, 107603. [Google Scholar] [CrossRef]
  100. Shiskova, I.; Stratiev, D.; Tavlieva, M.; Nedelchev, A.; Dinkov, R.; Kolev, I.; van den Berg, F.; Ribagin, S.; Sotirov, S.; Nikolova, R.; et al. Application of Intercriteria and Regression Analyses and Artificial Neural Network to Investigate the Relation of Crude Oil Assay Data to Oil Compatibility. Processes 2024, 12, 780. [Google Scholar] [CrossRef]
  101. Stratiev, D.; Marinov, I.; Dinkov, R.; Shishkova, I.; Velkov, I.; Sharafutdinov, I.; Nenov, S.; Tsvetkov, T.; Sotirov, S.; Mitkova, M.; et al. Opportunity to improve diesel fuel cetane number prediction from easy available physical properties and application of the least squares method and the artificial neural networks. Energy Fuels 2015, 29, 1520–1533. [Google Scholar] [CrossRef]
  102. Government of Canada: Environment Canada Crude Oil and Petroleum Product Database—ECCC Petroleum Products Database. 2021. Available online: https://open.canada.ca/data/en/dataset/53c38f91-35c8-49a6-a437-b311703db8c5/resource/518929da-e63f-4d72-a47f-c98c8ed23020 (accessed on 5 August 2024).
  103. Stratiev, D.; Shishkova, I.; Palichev, G.N.; Atanassov, K.; Ribagin, S.; Nenov, S.; Nedanovski, D.; Ivanov, V. Study of Bulk Properties Relation to SARA Composition Data of Various Vacuum Residues Employing Intercriteria Analysis. Energies 2022, 15, 9042. [Google Scholar] [CrossRef]
  104. Stratiev, D.; Shishkova, I.; Nikolaychuk, E.; Atanassova, V.; Atanassov, K. Investigation of relations of properties of straight run and H-Oil unconverted vacuum residual oils. Pet. Coal 2019, 61, 763–776. [Google Scholar]
  105. Ramirez-Corredores, M.M. The Science and Technology of Unconventional Oils Finding Refining Opportunities; Academic Press: Cambridge, MA, USA, 2017. [Google Scholar]
  106. Demirbas, A.; Alidrisi, H.; Balubaid, M.A. API Gravity, Sulfur Content, and Desulfurization of Crude Oil. Petrol. Sci. Technol. 2015, 33, 93–101. [Google Scholar] [CrossRef]
Figure 1. Neural network training performance for prediction of saturates content. The green circle indicates the point where the minimum mean squared error is obtained.
Figure 1. Neural network training performance for prediction of saturates content. The green circle indicates the point where the minimum mean squared error is obtained.
Processes 12 01755 g001
Figure 2. ANN-predicted versus experimental saturate fraction content values for training, validation, testing, and overall data set.
Figure 2. ANN-predicted versus experimental saturate fraction content values for training, validation, testing, and overall data set.
Processes 12 01755 g002aProcesses 12 01755 g002b
Figure 3. Neural network training performance for prediction of aromatics content. The green circle indicates the point where the minimum mean squared error is obtained.
Figure 3. Neural network training performance for prediction of aromatics content. The green circle indicates the point where the minimum mean squared error is obtained.
Processes 12 01755 g003
Figure 4. ANN predicted versus experimental aromatic fraction content values for training, validation, testing, and overall data sets.
Figure 4. ANN predicted versus experimental aromatic fraction content values for training, validation, testing, and overall data sets.
Processes 12 01755 g004aProcesses 12 01755 g004b
Figure 5. Neural network training performance for prediction of content of resins. The green circle indicates the point where the minimum mean squared error is obtained.
Figure 5. Neural network training performance for prediction of content of resins. The green circle indicates the point where the minimum mean squared error is obtained.
Processes 12 01755 g005
Figure 6. ANN predicted versus experimental resin fraction content values for training, validation, testing, and overall data set.
Figure 6. ANN predicted versus experimental resin fraction content values for training, validation, testing, and overall data set.
Processes 12 01755 g006aProcesses 12 01755 g006b
Figure 7. Neural network training performance for prediction of asphaltenes content. The green circle indicates the point where the minimum mean squared error is obtained.
Figure 7. Neural network training performance for prediction of asphaltenes content. The green circle indicates the point where the minimum mean squared error is obtained.
Processes 12 01755 g007
Figure 8. ANN predicted versus experimental asphaltene fraction content values for training, validation, testing, and overall dataset.
Figure 8. ANN predicted versus experimental asphaltene fraction content values for training, validation, testing, and overall dataset.
Processes 12 01755 g008aProcesses 12 01755 g008b
Figure 9. Parity graph for fitted and predicted versus measured saturate content (a,e), aromatic content (b,f), resin content (c,g), and asphaltene content (d,h), via the regression Equations (6) and (7) for the linear and non-linear regressions, with the coefficients shown in Table 4.
Figure 9. Parity graph for fitted and predicted versus measured saturate content (a,e), aromatic content (b,f), resin content (c,g), and asphaltene content (d,h), via the regression Equations (6) and (7) for the linear and non-linear regressions, with the coefficients shown in Table 4.
Processes 12 01755 g009aProcesses 12 01755 g009b
Figure 10. Relation of density to saturate content in 160 petroleum fluids (data from this work, and refs. [10,102]. The red line is obtained by using Equation (8).
Figure 10. Relation of density to saturate content in 160 petroleum fluids (data from this work, and refs. [10,102]. The red line is obtained by using Equation (8).
Processes 12 01755 g010
Table 1. Scope of fluctuation of petroleum density, flash point, sulfur content, SARA composition data, and high-temperature simulated distillation data from [85].
Table 1. Scope of fluctuation of petroleum density, flash point, sulfur content, SARA composition data, and high-temperature simulated distillation data from [85].
Density, at 15 °C, g/cm3Flash Point, °CSulphur, wt.%Saturate, wt.%Aromatics, wt.%Resins, wt.%Asphaltenes, wt.%Evaporated at (High-Temperature Simulated Distillation), wt.%
180 °C200 °C250 °C300 °C400 °C500 °C
Min0.7655−300.01215000.61.2513.331.144.7
Max0.9872954.519443302147102768899100
Table 2. μ-values obtained from ICrA evaluation of data from Table S1.
Table 2. μ-values obtained from ICrA evaluation of data from Table S1.
μDensity Flash PointSulfur ContentSaturateAromaticResinAsphaltene180 °C200 °C250 °C300 °C400 °C500 °C
Density 1.000.610.730.170.680.810.750.200.210.180.180.190.20
Flash point0.611.000.510.450.440.530.540.260.270.290.320.350.37
Sulfur content0.730.511.000.230.640.720.710.310.320.290.260.250.25
Saturate0.170.450.231.000.130.110.160.670.660.700.730.740.72
Aromatic0.680.440.640.131.000.710.620.380.390.360.350.350.36
Resin0.810.530.720.110.711.000.750.250.260.220.200.200.21
Asphaltene0.750.540.710.160.620.751.000.240.250.200.190.180.19
180 °C0.200.260.310.670.380.250.241.000.940.920.870.830.79
200 °C0.210.270.320.660.390.260.250.941.000.920.870.830.79
250 °C0.180.290.290.700.360.220.200.920.921.000.940.890.85
300 °C0.180.320.260.730.350.200.190.870.870.941.000.940.89
400 °C0.190.350.250.740.350.200.180.830.830.890.941.000.94
500 °C0.200.370.250.720.360.210.190.790.790.850.890.941.00
Note: Green color means statistically meaningful positive relation; red color implies statistically meaningful negative relation. The intensity of the color designates the strength of the relation; the higher the color intensity, the higher the strength of the relation. Yellow color denotes dissonance.
Table 3. υ-values obtained from ICrA evaluation of data from Table S1.
Table 3. υ-values obtained from ICrA evaluation of data from Table S1.
νDensity Flash PointSulfur ContentSaturateAromaticResinAsphaltene180 °C200 °C250 °C300 °C400 °C500 °C
Density 0.000.340.230.800.280.130.170.760.750.780.790.780.77
Flash point0.340.000.440.500.500.390.360.680.670.650.640.610.59
Sulfur content0.230.440.000.730.310.220.210.640.630.670.700.720.71
Saturate0.800.500.730.000.830.840.760.290.300.260.240.240.25
Aromatic0.280.500.310.830.000.230.290.570.560.590.610.620.60
Resin0.130.390.220.840.230.000.150.680.670.710.740.750.74
Asphaltene0.170.360.210.760.290.150.000.670.660.710.730.740.73
180 °C0.760.680.640.290.570.680.670.000.030.040.080.130.17
200 °C0.750.670.630.300.560.670.660.030.000.050.090.130.17
250 °C0.780.650.670.260.590.710.710.040.050.000.030.080.12
300 °C0.790.640.700.240.610.740.730.080.090.030.000.040.08
400 °C0.780.610.720.240.620.750.740.130.130.080.040.000.04
500 °C0.770.590.710.250.600.740.730.170.170.120.080.040.00
Note: Red color means statistically meaningful positive relation; green color implies statistically meaningful negative relation. The intensity of the color designates the strength of the relation; the higher the color intensity, the higher the strength of the relation. Yellow color denotes dissonance.
Table 4. Coefficients for linear (Equation (6)) and non-linear (Equation (7)) SARA regression models.
Table 4. Coefficients for linear (Equation (6)) and non-linear (Equation (7)) SARA regression models.
RegressionCoefficientsSaturatesAromaticsResinsAsphaltenes
Lineara1−393.221276.70969.104041.998
a20−2.78441.76642.2281
a300.18901−0.02830
a40.1643−0.180290.02030
a5405.89−228.9−50.5933−34.4371
Non-lineara1−141.2600−95.694434.381416.805478
a20−0.44320.73091.8693
a3−5.209290.282−0.0255−549.913
a40.22889−0.1969−0.02700
a5153.65−167.297−2.660543.843
b15.25734.3668.1170.9681
b202.3481.00751.2189
b30.31080.10.94000.00283
b41110
Table 5. Statistical analysis of studied methods to fit petroleum fluid SARA fraction contents from oil density, sulfur, flash point, and distillation characteristics.
Table 5. Statistical analysis of studied methods to fit petroleum fluid SARA fraction contents from oil density, sulfur, flash point, and distillation characteristics.
ModelStatistical ParametersSaturatesAromaticsResinsAsphaltenes
ANNAAD, %3.53.11.10.9
Max. dev., %17.615.29.85.0
Bias, %−0.7−0.5−0.1−0.1
St. Dev. %4.94.52.01.6
R0.9590.8630.9550.964
Linear regressionAAD, %5.14.12.21.9
Max. dev., %20.715.313.08.4
Bias0.00.00.30.0
St. Dev.6.75.33.22.6
R0.9210.7870.8920.858
Non-linear regressionAAD, %4.73.91.91.8
Max. dev., %19.214.012.511.1
Bias−0.3−0.10.0−0.3
St. Dev.6.25.12.82.6
R0.9330.8110.9110.862
Table 6. Statistical analysis of studied methods to predict petroleum fluid SARA fraction contents from oil density, sulfur, flash point, and distillation characteristics (testing data).
Table 6. Statistical analysis of studied methods to predict petroleum fluid SARA fraction contents from oil density, sulfur, flash point, and distillation characteristics (testing data).
ModelStatistical ParametersSaturatesAromaticsResinsAsphaltenes
ANNAAD, %5.76.02.61.3
Max. dev., %17.615.29.84.4
Bias, %−1.4−1.7−0.2−0.6
St. Dev. %7.18.04.01.9
R0.9290.6970.8860.908
Linear regressionAAD, %6.04.82.81.8
Max. dev., %17.213.513.86.2
Bias1.1−1.3−1.30.8
St. Dev.8.16.44.72.5
R0.9110.6840.8280.832
Non-linear regressionAAD, %5.34.82.52.0
Max. dev., %16.712.813.59.2
Bias0.5−0.2−0.80.4
St. Dev.6.86.54.23.2
R0.9380.6770.8710.720
Table 7. Statistical analysis of studied methods to predict petroleum fluid SARA fraction contents from oil density, sulfur, flash point, and distillation characteristics for 66 petroleum oils [102].
Table 7. Statistical analysis of studied methods to predict petroleum fluid SARA fraction contents from oil density, sulfur, flash point, and distillation characteristics for 66 petroleum oils [102].
ModelStatistical ParametersSaturatesAromaticsResinsAsphaltenes
ANNAAD, %8.210.35.84.4
Max. dev., %27.330.222.718.7
Bias, %4.68.8−1.9−0.6
St. Dev. %10.613.37.36.3
R0.538NDND0.647
Linear regressionAAD, %14.914.53.92.6
Max. dev., %26.032.213.710.0
Bias−14.514.32.20.4
St. Dev.16.916.85.23.5
RNDND0.5820.906
Non-linear regressionAAD, %22.314.96.32.5
Max. dev., %51.240.521.510.8
Bias−21.914.76.0−1.2
St. Dev.25.817.78.23.6
R0.608NDND0.899
ND: not possible to determine due to the presence of negative value for the squared multiple correlation coefficient.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Shiskova, I.; Stratiev, D.; Sotirov, S.; Sotirova, E.; Dinkov, R.; Kolev, I.; Stratiev, D.D.; Nenov, S.; Ribagin, S.; Atanassov, K.; et al. Predicting Petroleum SARA Composition from Density, Sulfur Content, Flash Point, and Simulated Distillation Data Using Regression and Artificial Neural Network Techniques. Processes 2024, 12, 1755. https://doi.org/10.3390/pr12081755

AMA Style

Shiskova I, Stratiev D, Sotirov S, Sotirova E, Dinkov R, Kolev I, Stratiev DD, Nenov S, Ribagin S, Atanassov K, et al. Predicting Petroleum SARA Composition from Density, Sulfur Content, Flash Point, and Simulated Distillation Data Using Regression and Artificial Neural Network Techniques. Processes. 2024; 12(8):1755. https://doi.org/10.3390/pr12081755

Chicago/Turabian Style

Shiskova, Ivelina, Dicho Stratiev, Sotir Sotirov, Evdokia Sotirova, Rosen Dinkov, Iliyan Kolev, Denis D. Stratiev, Svetoslav Nenov, Simeon Ribagin, Krassimir Atanassov, and et al. 2024. "Predicting Petroleum SARA Composition from Density, Sulfur Content, Flash Point, and Simulated Distillation Data Using Regression and Artificial Neural Network Techniques" Processes 12, no. 8: 1755. https://doi.org/10.3390/pr12081755

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop