*2.2. Fitting*

The exponential expression for the flowback volume can be used to obtain the daily flowback rate *Rl* for a single well as follows:

$$R\_I(t) = \frac{q}{Q\_{total}} = \frac{\alpha}{Q\_{total}} \varepsilon \text{exp}(-\beta t) = \gamma \varepsilon \text{exp}(-\beta t) \tag{18}$$

where *q* denotes the daily flowback volume, m3/d; *Qtotal* denotes the total quantity of injected fracturing fluid, m3; α is a daily flowback volume coefficient, m3/d; *β* is a decreasing flowback coefficient, 1/d, which is a fitting parameter; *γ* = *α*/*Qtotal* is the daily flowback rate coefficient, 1/d, which is also a fitting parameter; and *t* denotes the production time (d).

The curves for the daily flowback rate and daily flowback volume have the same shape but different magnitudes. For a total daily flowback volume that is recorded only once a day, the range of *t* in Equation (18) is {*t* ≥ 1,*t*∈Z}. The cumulative flowback rate *S* ˆ *N* for the previous *N* days can be then calculated as:

$$\hat{S}\_N = \sum\_{t=1}^N \gamma \exp(-\beta t) \frac{\gamma}{e^{\delta} - 1} \left(1 - \frac{1}{e^{N\beta}}\right) \tag{19}$$

The cumulative flowback rates on the 30th, 90th, 180th and 360th day of production for a shale gas well are used as the characteristic flowback rates, and the estimated values of *α* and *β* are obtained by solving Equation (20):

ˆ

⎧⎪⎪⎨⎪⎪⎩

$$\begin{array}{l} S\_{30} = S\_{30} \\ S\_{90} = \hat{S}\_{90} \\ S\_{180} = S\_{180} \\ S\_{360} = S\_{360} \end{array} \tag{20}$$

The values on the left and right sides of Equation (20) correspond to measured data and the calculation results from Equation (19), respectively. As there are fewer unknowns than equations in Equation (20), exact solutions for *γ* and *β* cannot be obtained. Therefore, *γ* and *β* are estimated by minimizing the sum of squares of the differences between the two sides of Equation (20):

$$\left(\hat{\gamma}, \hat{\beta}\right) = \arg\min\_{\left(\gamma, \beta\right)} \sum\_{N} \left(\hat{S}\_{N} - S\_{N}\right)^{2} \tag{21}$$

where the values of *N* are 30, 90, 180, and 360.

#### *2.3. Big Data Analysis*

A large quantity of data is generated during the exploration and development of shale gas, i.e., from drilling to fracturing and from well closing in to flowback. Various factors affect the flowback, which cannot be solved using currently available mathematical equations. Big data can be used to solve this problem. In this study, the Spearman correlation coefficient and distribution estimation were used to perform a big data analysis.

#### 2.3.1. Correlation Analysis

The Spearman correlation coefficient has two advantages. First, sortable variables are assumed, and a normal sample distribution of the samples is not required, which is appropriate for the dataset used in this study. Second, the Spearman correlation corresponds to a monotonous, rather than linear, correlation between two random variables *X* and *Y* and is therefore likely to reveal a nonlinear relationship between the variables. The Spearman correlation coefficient is defined as follows [18]:

$$\rho\_{XY} = 1 - \frac{6\sum\_{i=1}^{N} [rank(\mathbf{x}\_i) - rank(y\_i)]^2}{N(N^2 - 1)}\tag{22}$$

where *N* denotes the number of samples, and rank denotes the rank number of an observed value of *X* or *Y*.

The range of *ρXY* is [ −1, 1]. A positive or negative *ρXY* indicates a positive or negative correlation between *X* and *Y*, respectively. The higher the absolute value of *ρXY* is, the stronger the correlation is. The significance of the Spearman correlation coefficient can be tested by the following hypotheses [20]:

$$H\_0: \, \rho\_{XY} = 0 \tag{23}$$

$$H\_1: \rho\_{XY} \neq 0\tag{24}$$

If H0 is satisfied, then:

$$
\rho\_{XY} \sim \text{Normal}\left(0, \frac{1}{n-1}\right) \tag{25}
$$

Generally, if *p* ≤ 0.05, *H*0 is rejected, and *ρXY* is considered significant, whereas *p* > 0.05 indicates there is insufficient evidence to reject *H*0, and *ρXY* may be false.

#### 2.3.2. Estimation of Distribution of Fitted Parameters

In order to test whether the study sample is statistically significant, it is necessary to analyse the probability distribution of parameter fitting results. Consider a candidate distribution for the set of fitted parameters with a probability density function *f*(*x/p*) and a log-likelihood function *l*(*p/x*). The optimal parameter *p*ˆ of this candidate distribution can be obtained using the following equation:

$$
\hat{p} = \underset{p}{\text{arg}\max} l(p|\mathbf{x}) \tag{26}
$$

Let ˆ *f*(*x*|*p*<sup>ˆ</sup>) denote the optimal estimate of the probability density function *f*(*x*|*p*). The quality of the estimate can be tested using the following equation:

$$SSE = \sum\_{i=1}^{N\_H} \left( \hat{f}(\mathbf{x}\_i | \hat{\mathbf{p}}) - h(\mathbf{x}\_i) \right)^2 \tag{27}$$

where *SSE* denotes the sum of the squares of the errors; *h*(*x*) denotes the probability density obtained from a histogram; and *NH* denotes the number of bins of the histogram.

The optimal estimate minimizes *SSE* and is denoted as Pˆ. The correctness of the estimation can be evaluated by the Kolmogorov–Smirnov test. For a dataset D with a sample size *n* and a reference distribution P, the Kolmogorov–Smirnov test is [20]: *H*0 : D and P have the same distribution and *H*1 : D and P have different distributions.

The tested statistics are as follows:

$$D\_n = \sup |F\_n(\mathbf{x}) - F(\mathbf{x})| \tag{28}$$

where *Fn* (*x*) is an empirical distribution function for the data, and *F*(*x*) is the cumulative distribution function of the reference distribution P. Let the fitted parameters corresponding to the dataset D and Pˆ denote the optimal reference distribution for the reference distribution P. When *p* ≤ 0.05 in Equation (28), *H*0 is rejected, that is, the estimated results may not conform to the distribution of the fitted parameters. For *p* > 0.05, *H*0 cannot be rejected, that is, the estimated results may conform to the distribution of the fitted parameters.

*x*

#### **3. Results and Discussion**

The Weiyuan shale gas block was considered as a case study. Flowback, production, fracturing, drilling, and geological data were collected for 214 horizontal wells as shown in Table 1. The flowback data of each well was fitted to yield the following daily flowback rate:

$$R\_l(t) = 0.0121 \exp(-0.0293t) \tag{29}$$

where 0.0121 and 0.0293 are the mean values of *γ*ˆ and *β*ˆ. The mean *R*<sup>2</sup> for all the fitting results is 0.9212, and the mean value of the mean absolute error is 0.0157, indicating a good fit. The distributions of the fitted parameters, *γ*ˆ and *β*ˆ, are asymmetrical. Therefore, it is inappropriate to use normal distributions for *γ*ˆ and *β*ˆ. The theoretical domain of *γ*ˆ and *β*ˆ is (0,+ ∞). To obtain the distributions of *γ*ˆ and *β*ˆ, seven candidate distributions with support sets of (0,+ ∞) or (0,+ ∞) were selected, and the optimal parameters of each distribution were calculated by considering scaling but not translation, as shown in Table 2.


**Table 2.** Estimated probability densities for *γ*ˆ and *β*ˆ.

Table 2 shows that the F distribution is the optimal distribution for *γ*ˆ and *β*ˆ, and the estimated results and optimal parameters are shown in Figure 4.

The Kolmogorov–Smirnov test was conducted on the empirical distributions of *γ*ˆ and *β*ˆ, and the optimal distributions had *p* values of 0.8209 and 0.4793, respectively, which were both above 0.05. Therefore, the distributions in Figure 3 can be used as empirical distributions of *γ*ˆ and *β*ˆ. As can be seen from Figure 3, the fitting results of 214 wells are statistically significant and can be further used for big data analysis. For the Weiyuan shale gas block, this empirical distribution can be used to predict the most likely distribution range of *γ* and *β* of wells that are about to be put into production and further predict the flowback curve.

**Figure 4.** Best estimated distribution of *γ*ˆ and *β* 

A physical interpretation of the results is that *γe*(−*β*) corresponds to the flowback rate on the first day, which is also the peak daily flowback rate, and *β* corresponds to the speed at which the theoretical daily flowback decreases. Thus, the larger *γ* is, the higher the starting point of the flowback curve is; the larger *β* is, the lower the starting point of the flowback curve is, and the more rapidly the flowback decreases. Figure 5 shows the relationship between *γ*ˆ and *β* ˆ .

ˆ

.

Figure 5 shows a strong and significant correlation between *γ*ˆ and *β* ˆ . Both parameters are related to the shape of the theoretical flowback curve and are therefore very likely to depend on each other. As the curve shape is jointly determined by two factors, the height of the starting point and the speed of decrease, neither factor can fully reflect the flowback volume. Hence, a variable composed of *γ* and *β* was used in this study to directly describe the flowback volume or flowback rate and is used in conjunction with static variables to perform a binary analysis. The fitting results are as follows: the minimum and maximum

values of *β* ˆ are 0.0068 and 0.0658, respectively, and the 0.01th and 99.99th percentiles of the estimated distribution are 0.0052 and 0.0686, respectively.

**Figure 5.** Correlation between *γ*ˆ and *β* ˆ .

Figure 6 shows that the higher the value of *β* is for the theoretical curve, the more rapidly *f*(*<sup>N</sup>*; *β*) = 1 − 1*eNβ* converges to 1. For *β* = 0.0052, *f*(*N;β*) reaches 0.9945 at *N* = 1000, and for *β* = 0.0686, *f*(*N;β*) reaches 0.9990 at *N* = 100, indicating that there is a minimal fracturing fluid volume after 1000 days of flowback for most shale gas wells. In Equation (19), *γ*/(e*<sup>β</sup>* − 1) corresponds to the approximate long-term cumulative flowback rate; thus, a flowback characteristic coefficient is defined, *η* = *γ*/(e*<sup>β</sup>* − 1), that reflects the flowback potential of a horizontal well.

**Figure 6.** Sensitivity analysis of *f*(*N;β*).

The flowback curve can be simplified by flowback characteristic coefficient. With *η* as the analysis object, the Spearman coefficient can be used to analyse the correlation between flowback rates and geological and engineering parameters and find the main affecting factors.

The correlation between *η* and the 14 static parameters in Table 1 was analysed using the Spearman correlation coefficient method. For comparison, the correlation between the first-year average daily production rate and 14 static parameters was also studied. Table 3 shows the correlations between these two variables and the static parameters, where the *p* value is given in parentheses. Static parameters with nonsignificant correlations with the two variables were excluded.


**Table 3.** Correlations between *η*, first-year average daily production rate, and static parameters.

Table 2 shows that the first-year average daily production rate is highly correlated with the thickness × reservoir drilling length of high-quality reservoir, and the correlation coefficient exceeds 0.5, indicating that the larger the thickness × drilling length of the high-quality reservoir, the higher the first-year average daily production rate, which is consistent with the results of Ma et al. [21]. The correlation between the static parameters and the first-year daily production rate can be sorted by the correlation coefficient as follows: thickness × drilling length of high-quality reservoir > fracturing lateral length > number of fracturing stages > brittle mineral content > gas saturation > fracturing stage interval > pressure coefficient × vertical depth. The drilling length of the high-quality reservoir, the fracturing lateral length, the number of fracturing stages, and the fracturing stage interval are engineering parameters. The better the drilling effect, the better fracturing effect and the higher the first-year average daily production rate. High-quality reservoir thickness, brittle mineral content, gas saturation and pressure coefficient × vertical depth are geological parameters. The better physical properties of shale gas reservoir, the higher first-year average daily production rate. This is why there is a good correlation between the first-year average daily production rate and various parameters.

Although the correlation coefficients between *η* and the static parameters are less than 0.5, relatively speaking, *η* and the thickness × drilling length of high-quality reservoir are best correlated, and the correlation symbol is opposite to that of the first-year daily production rate, which means that the higher the first-year average daily production rate, the lower the flowback rate. The correlation between static parameters and *η* is sorted by coefficient, which is basically consistent with that of first-year average production rate, but the positions of individual parameters are interchanged. The order according to the absolute value of correlation is thickness × drilling length of high-quality reservoir > number of fracturing stages > pressure coefficient× vertical depth > fracturing stage interval > fracturing lateral length > brittle mineral content > gas saturation. *η* is positively correlated with the fracturing stage interval, which means that the larger the fracturing stage interval, the worse the fracturing effect. The hydraulic fracturing process does not create complex fracture networks, and the fracturing fluids mainly concentrate near the main fracture and wellbore, resulting in high flowback rate. *η* is positively correlated with the pressure coefficient × vertical depth. The pressure coefficient × vertical depth represents the formation pressure. The higher the formation pressure, the stronger the liquid carrying capacity of the shale gas well and the higher the flowback rate. *η* is negatively correlated with the fracturing lateral length, the number of fracturing stages, and the content of brittle minerals. The higher the brittle minerals, the easier the reservoir is to be fractured; the longer the fracturing lateral length is and the more the number of fracturing stages is, the more complex the fracture network will be formed after hydraulic fracturing and the larger the SRV will be. The fracturing fluids are bound in complex micro fractures and cannot be discharged, resulting in a low flowback rate.

Table 4 shows the correlations between *η* and the whole static parameters in Table 1. Figure 6 shows the hot map of correlations between *η* and static parameters. From Table 1 and Figure 7, it can be seen that *η* is well correlated with the 30 day, 90 day, 180 day, 360 day, and peak gas production flowback rates. The longer the flowback time, the better the correlation between *η* and the flowback rate. The correlation between *η* and the 30 day, 90, 180, and 360 day flowback rate confirmed that it is reasonable and scientific to use *η* to characterize the flowback rate. *η* is positively correlated with vertical depth, TOC content, porosity, pressure coefficient, and average fracturing intervals, which means the higher these parameters are, the higher the flowback rates are. *η* is negatively correlated with high-quality reservoir thickness, gas saturation, brittle mineral content, fracturing fluid intensity, proppant intensity, average hydraulic fracturing fluid displacement, and drilling length in high-quality reservoirs, which means the higher these parameters are, the lower flowback rates are. *η* is negatively correlated with the first-year average daily production rate and EUR, but the correlation coefficients are low. It means that flowback rate is not merely correlated with gas production.


**Table 4.** Correlations between *η* and static parameters.

**Figure 7.** Hot map of correlations between *η* and static parameters.

From the correlation between various static parameters and the first-year average daily production rate and *η*, it is reasonable to simplify and characterize the flowback characteristics of shale gas wells by the flowback characteristic coefficient *η*. In our previous study [17], based on the actual data in the Weiyuan shale gas block, combined with a deep learning algorithm, two different depth feedforward neural networks were designed, which are the single output neural network and the multi-output neural network, to predict the shale gas well flowback rate. The 30 day, 90, 180, 360 day, and peak gas production flowback rates of shale gas wells were predicted. This method can predict the flowback rate of a specific day, but the main factors affecting the flowback rate at any time cannot be obtained. Combining the deep learning algorithm and the backflow characteristic coefficients proposed in this article, the flowback rate at any time in the future can be predicted. The method proposed in this paper also has some limitations. For gas wells with intermittent shut-in or long-term shut-in or for long-term flooded gas wells, there will be large errors in the prediction of the flowback rate. In addition, the research does not consider the impact of gas–liquid two-phase flow on flowback.

#### **4. Summary and Conclusions**

Based on the seepage theory of fracturing fluid in multi-stage fractured horizontal wells, the study uses convolution and Laplace transform methods to abstract the flowback curve into two characteristic parameters, the daily flowback rate coefficient *γ* and the flowback decline coefficient *β*. Taking the Weiyuan shale gas block as a study case, the flowback data of 214 wells were fitted, and the distribution characteristics of the fitted characteristic parameters were studied. The flowback characteristic coefficients to characterize the flowback potential of shale gas wells were established. The Spearman correlation coefficient method was used to study the correlation between the geological and engineering static parameters of 214 wells, the characteristic flowback coefficients, and the first-year average daily production rate. There are several conclusions obtained from this research:

(1) The fitting results of the flowback curve for 214 production wells show that the average daily flowback rate coefficient and the flowback decline coefficient of all wells are 0.0121 and 0.0293, and the average value of R<sup>2</sup> of all fitting results is 0.9212, the mean value of the mean absolute error is 0.0157, and the fitting effect is better. Both the daily flowback rate coefficient *γ* and the flowback decline coefficient *β* are right-skewed distributions.

(2) The comparative study on the correlation between the flowback characteristic coefficient and the first-year average daily production rate and static parameters shows that the thickness × the drilling length of a high-quality reservoir is best correlated with the characteristic flowback coefficient and the first-year average daily production rate, and their correlation symbol is opposite, which means that the larger the thickness × the drilling length of the high-quality reservoir, the higher the average first-year daily production rate and the lower the flowback rate of the shale gas well. The factors affecting the flowback rate mainly include geological factors and engineering factors. The order of correlation coefficients is as follows: thickness × drilling length of high-quality reservoir, number of fracturing stages, pressure coefficient × vertical depth, fracturing stage interval, fracturing lateral length, brittle mineral content, and gas saturation.

(3) Through the method established in this paper, the shale gas flowback curve containing complex flow mechanism can be abstracted into simple characteristic parameters and characteristic coefficients. The method proposed in this paper can provide a novel way for machine learning and other big data analysis methods to study the flowback characteristics of shale gas horizontal wells. In future studies, the time-dependent gas output will be related to the flowback rate. Combining with machine learning, the flowback rate and gas output at any time can be predicted in the future.

**Author Contributions:** Conceptualization, W.G.; methodology, X.Z.; formal analysis, L.K.; investigation, Y.L.; resources, J.G.; data curation, W.G.; writing—original draft preparation, W.G.; writing— review and editing, X.Z. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was financially supported by the R&D Department of Petrochina (No. 2021DJ2005).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.
