**2. Theory**

Full waveform inversion is an example of a nonlinear ill-posed problem. In general, the solution to this problem is to minimise the discrepancies between the observed and modelled data. From the mathematical point of view, this is a nonlinear problem that is usually being treated as a linearised least-squares problem. However, even the linearised problem still ill-posed and, consequently, several solutions can provide a satisfactory fit between the observed and modelled data.

One way to circumvent this ambiguity of solutions is by introducing prior information adding to the formalism of the inverse problem a terms of regularisation. Thus, we will briefly present the mathematical methodology of the inversion algorithm with the contribution of prior information.

Let *F* be a generic cost function that is given by:

$$F(\mathbf{m}) = \Phi\_d(\mathbf{m}) + \alpha \Psi\_{\mathrm{ll}}(\mathbf{m}).\tag{1}$$

For FWI, the term *<sup>Φ</sup>d*(**m**) is usually constructed through the *L*2 norm of the residue between the modeled and observed data, which is:

$$\Phi\_d(\mathbf{m}) = \sum\_{\rm ns} \frac{1}{2} [(\mathbf{d}\_{\rm obs} - \mathbf{d}\_{\rm cal}(\mathbf{m}))^t (\mathbf{d}\_{\rm obs} - \mathbf{d}\_{\rm cal}(\mathbf{m}))],\tag{2}$$

where **d***obs* and **d***cal*(**m**) represent the observed and calculated data vectors, respectively. In our work, we use a time domain approach and each component of these vectors are samples of time domain seismograms recorded at each of the receivers for a seismic source. The misfit function is the result of the sum realized over all *ns* sources of the seismic acquisition. There is a non-linear dependence on the modeled data and the model parameters, as represented by **m**. The model parameters are determined by an inverse process that aims to reduce the residue between the modeled and observed data.

In our case, the second term of the cost function will be responsible for adding the prior information (this a prior information can, for example, come from well profiles) to the inversion process. Here, the prior information will be denoted by **<sup>m</sup>r**. This prior information can be added in different ways to the inversion. In FWI, the model norm with this intention is used in [8]. We use this form for comparison purposes, so we can write *Ψ* as:

$$\Psi\_{\rm ll}(\mathbf{m}) = \sum\_{i=1}^{N} (m\_i - m\_i^r)^2. \tag{3}$$

The first form that we studied in this experiment was the relative entropy described by [24,25]:

$$\Psi' = \sum\_{i=1}^{N} m\_i \ln(\frac{m\_i}{m\_i^r}),\tag{4}$$

the Equation (4) is the *Kullback–Leibler*'s distance from **m<sup>r</sup>** to **m**. Usually, this equation is used in association with probability distribution. This makes the relative entropy always positive. However, this is only true because the probabilities of the events fall between zero and one [26]. In order to avoid a reformulation of the problem, our idea is to use it in a deterministic way, which is, without using concepts of probability.

Formally, the Kullback–Leibler distance is a pseudo-distance, as it does not satisfy two properties of the metric definition; triangular inequality and symmetry [27]. The fact of nonsymmetry led us to our second case study, which will be represented in the equation below:

$$\Psi' = \sum\_{i=1}^{N} m\_i^r \ln(\frac{m\_i^r}{m\_i}).\tag{5}$$

It becomes necessary to analyze the behavior of these functions. For this, we simulate a situation in which the prior information was constant (**m<sup>r</sup>** = 10, for example) and plot the graph of the function *y* = *xln*(*x*/10) and *y* = <sup>10</sup>*ln*(10/*x*), as can be seen in the Figure 1. Analyzing Figure 1, we can infer that *Ψ* (in Equations (4) and (5)) can present positive and negative values. The graph of Equation (4) (red curve) shows that *Ψ* will always be negative when the values of the model parameters are less than the parameters of the reference model. The graph of Equation (5) (blue curve) shows that *Ψ* will always be negative when the values of the model parameters are greater than the parameters of the reference model. In addition, the graph of Equation (5) (red curve) shows that this function does not present a minimum.

**Figure 1.** Graphic of the functions *y* = *xln*(*x*/10) (red curve) and *y* = <sup>10</sup>*ln*(10/*x*) (blue curve). This function is not positive definite. The function *y* = <sup>10</sup>*ln*(10/*x*) does not present a minimum

This behavior (sometimes positive and negative) in both equations brings an inconvenience to inversion. At one point, we would be minimizing the function at another, maximizing the function.

The simplest way to transform Equation (4) into positive definite is to work with the quadratic form. Thus, we rewrite Equation (4), as follows:

$$\Psi' = \sum\_{i=1}^{N} \left[ m\_i \ln \left( \frac{m\_i}{m\_i^r} \right) \right]^2. \tag{6}$$

Analogously, we can change Equation (5) into positive definite taking its quadratic form. Thus, we rewrite Equation (5), as follows:

$$\Psi' = \sum\_{i=1}^{N} \left[ m\_i^r \ln \left( \frac{m\_i^r}{m\_i} \right) \right]^2. \tag{7}$$

As was done for Equations (4) and (5), we plotted a graph of the function described by Equations (6) and (7), as can be seen in Figure 2. Analyzing Figure 2, it can be seen that Equation (6) is positively defined. However, in the example that is illustrated in Figure 2, we can observe the presence of two minimums. This leads us to interpret that the use of Equation (6) in FWI can increase the problem of local minimums (this will be exemplified in

numerical applications). Also analyzing Figure 2, we can expect that Equation (7) presents the characteristics that are favorable to the inversion process, which is, it is a definite positive function and that only presents a minimum.

**Figure 2.** Graphic of the functions *y* = [*xln*(*x*/10)]<sup>2</sup> (red curve), *y* = [10*ln*(10/*x*)]<sup>2</sup> (blue curve) and *y* = *xln*(*x*/10) − (*x* − 10) (green curve). The function *y* = [*xln*(*x*/10)]<sup>2</sup> is positive definite, but it has more than a minimum. The functions *y* = [10*ln*(10/*x*)]<sup>2</sup> and *y* = *xln*(*x*/10) − (*x* − 10) are positive definite and present only a minimum.

Finally, we can add the prior information to FWI with the axiomatic form that is given by (8) [28,29]:

$$\Psi' = \sum\_{i=1}^{N} \left[ m\_i \ln \left( \frac{m\_i}{m\_i^r} \right) - (m\_i - m\_i^r) \right],\tag{8}$$

the Equations (4) and (8) are similar. However, Equation (8) has a term referring to the difference of the models that leaves it with the characteristic that we expect (definite positive function), as can be seen in Figure 2 in a green curve.

If we minimize the objective function in the classical way, we obtain a system of equations that can be expressed as:

$$\mathcal{H}\_{\rm F} \Delta \mathbf{m} = -\mathcal{G}\_{\rm F} \tag{9}$$

where H*F* and G*F* represent the Hessian and gradient of cost function, respectively. In this case, the gradient represents the sum of two terms. If the *Ψ* function is the model norm (Equation (3)), we have:

$$\mathcal{G}\_F = \mathbf{J}^T (\mathbf{d}\_{obs} - \mathbf{d}(\mathbf{m})) + 2\alpha (m\_i - m\_i^r). \tag{10}$$

For the *Ψ* function to be represented by Equation (6), the gradient will be represented, as follows:

$$\mathcal{G}\_{\rm F} = \mathbf{J}^{T} (\mathbf{d}\_{obs} - \mathbf{d}(\mathbf{m})) + 2a \left[ \left( m\_i \ln(\frac{m\_i}{m\_i^r}) \right) \left( \ln(\frac{m\_i}{m\_i^r}) + 1 \right) \right]. \tag{11}$$

When the *Ψ* function that is represented by Equation (7) is used, the gradient expression will be given by:

$$\mathcal{G}\_{\rm F} = \mathbf{J}^{T} (\mathbf{d}\_{\rm obs} - \mathbf{d}(\mathbf{m})) - 2a \left[ \left( m\_{i}^{r} l n \left( \frac{m\_{i}^{r}}{m\_{i}} \right) \right) \left( \frac{m\_{i}^{r}}{m\_{i}} \right) \right]. \tag{12}$$

In case the *Ψ* function used is the one represented in Equation (8), the gradient will be described, as follows:

$$\mathcal{G}\_{\rm F} = \mathbf{J}^{T}(\mathbf{d}\_{\rm obs} - \mathbf{d}(\mathbf{m})) + a \left[ m\_{i} l n \left( \frac{m\_{i}}{m\_{i}^{r}} \right) \right]. \tag{13}$$

The term **J**that is present in the gradient expressions is the sensitivity matrix. The sensitivity matrix is composed of the derivatives of the modeled data with respect to the model parameters (**J** = *∂***d**(**m**)/*∂***m**). The elements of **J** are not explicitly calculated because they demand a high computational cost. For this reason, the adjoint formulation [30] is used for this purpose.

Asnaashari et al. [8] showed that we should work with a dynamic weighting of the term regularisation. The basic idea of this methodology is to help the inversion process to converge to the global minimum of the objective function by increasing the importance of prior information at the beginning of the process and gradually decreasing the penalty term weighting until, in the final iterations, the convergence driven by the term of the data. In this work, the *α* = *μγ* parameter is a dynamic weighting of the term regularisation and it has the role of decreasing the weight of the entropy term over the iterations. We built the dynamic term (*γ*) from the ratio of the gradients, as can be seen in Equation (14):

$$\gamma = \frac{\sum\_{i=1}^{M} (\nabla \Phi\_i)^2}{\sum\_{i=1}^{M} (\nabla \Psi\_i^{\prime})^2} \,' \tag{14}$$

where *Φi* and *Ψi* are the elements of the gradients vectors of misfit and regularise, respectively. In our tests, as will be seen in numerical applications, the *γ* value proved to be inadequate (the initial value was large) and it needed to be adjusted. We made this adjustment using the *μ* parameter.

We calculated the gradients of the terms of those that are responsible for adding the prior information easily and directly added to the data gradient. The term of the Hessian matrices, which is composed of the second derivative of misfit function and relative entropy, is not explicitly resolved in this paper. We calculated the hessian using a limited quasinewton method that is known in the literature as L-BFGS-B. The routine that was proposed by [31] considers that the inverse Hessian matrix is non-diagonal and roughly obtains its elements from the gradient vectors and previous models by performing a line search that satisfies Wolfe's conditions.

#### **3. Numerical Tests**

We only worked with the acoustic case (i.e., a P velocity model) and considered the homogeneous density distribution. We also considered a regular grid with 12.5 m spacing, which is used in both modelling and inversion. The data that were observed and modelled in time were obtained from the acoustic wave equation through finite-difference modelling, where an eighth order approximation for the Laplacian operator and a second-order approximation for the time derivative were considered. A CPML absorption boundary layer was employed to avoid boundary reflection [32,33]. The absorbing layer was applied to all sides of the model, using a width of 40 cells. The FWI worked here was performed in the time domain using all spectrum frequencies.

In this section, we will show the contribution of prior information added to the FWI through relative entropy. Therefore, we chose using the first and second part of the BP 2004 benchmark [34]. For the first part, the acquisition geometry consisted of 475 hydrophones distributed along a straight line 12.5 m deep, with 12.5 m spacing between each receiver. For the shots, 15 sources spaced 395 m arranged in line with 25 m deep. For all shots, a Ricker wavelet source with a central frequency of 10 Hz was used and the time record was 5.0 s. For the second part, the geometry acquisition is similar to that used in the first part, but the central frequency of Ricker was 12 Hz with a record time of 6.5 s.

Given that FWI is usually treated as an iterative process, an initial velocity model is required. For example, this model may be the result of a tomography that is based on the times of first arrivals and reflected events. For this work, we perform the smoothing of the real model (Figure 3), and we use it as an initial model in the FWI process.

**Figure 3.** Illustration of the initial models used as an initial model to FWI. (**a**) Smoothing of the first part from the BP model. (**b**) Smoothing of the second part from the BP model.

For this study, we assume that we have information from exploration wells. The velocities profile are measurements that provide a good measure of the local depth velocity. Thus, we will use these sonic profiles to build our a priori information model. A linear interpolation was made between the two wells. In the other regions, we use an extrapolation of the well's velocity profile. We also apply a slight smoothing to this a priori model of velocity. We can see this interpolated model in Figure 4. Although not geologically significant, this model contains some travel time information, and it will be considered an a priori velocity model and incorporated into the FWI through the relative entropy of the model.

**Figure 4.** The prior models built by linear interpolation between the values and extrapolation outside from wells that will be added in the FWI. (**a**) Prior model of the first part from BP model, (**b**) Prior model of the second part from BP model.

First, we performed the inversion without adding prior information on both parts of the BP model that are illustrated in Figure 5a,b. This means that, in Equation (1), *α* = 0. We used the initial models that are shown in Figure 3a,b. The results of the FWI for each of the cases are shown in Figure 6a,b. Clearly, in both cases, the conventional FWI (without any type of regularisation) converges to a local minimum. Asnaashari et al. [8] discussed some differences between the prior and initial models in the inversion procedure. In this case, the smoothed model that is shown in Figure 3 and the a priori model shown in Figure 4 have only part of the real model information. For the FWI result to converge to the desired result, both of the models must be used in a complementary way.

**Figure 5.** True *Vp* velocity model, which are parts of the BP model and the scheme of acquisition. The red-dashed line represents the position of the receivers, while the green line represents the position of the sources. (**a**) First part from the BP model, the white arrows illustrate the target zones (overpressure zones) and the black-dashed lines represent the position of the two wells that cross overpressure zones. (**b**) Second part from the BP model, the white arrows illustrate the target zones (channels) and the black-dashed lines represent the position of the two wells that cross the channels.

**Figure 6.** (**a**) FWI results using the smoothed model and without prior information (*α* = 0) (**a**) in first part of BP model and (**b**) in second part of BP model.

Given the obtained results, we will add the prior information to the FWI formalism in four different ways, as shown later on.

#### *3.1. Model Norm*

In this section, we add the prior information to the FWI using Equation (3). This method was used by [8] to incorporate the prior information in the FWI. Here, as previously mentioned, we will use the results obtained here to compare with the entropy methods that are the focus of this work. Therefore, we performed the FWI for both models (first and the second parts of the BP model), adding the prior information that is shown in Figure 4. The initial models used are those that are represented in Figure 3. Figures 7a and 8a show the results. When we compare these results with those that are obtained without adding prior information (Figure 6a,b), we observe an improvement in the quality of the FWI result. We observed that, in general, the body of salt was recovered in both models (although the first part of the BP model presents a small problem on the left side of the well positioned at *x* = 2.3 km). For a more detailed quality control, we can see the profiles in the positions of the wells in Figure 7b,c for the first part of the model, and Figure 8b,c for the second part of the model. By analysing the profiles, we can confirm that the addition of prior information in FWI through the model norm provides a good result.

A crucial point for the success of adding prior information in FWI scheme is the choice of the alpha parameter. In this work, as described in the theoretical section, the *α* parameter is the product of two terms. The first is a dynamic term (*γ*), as calculated from Equation (14).Using the model norm, the *γ* parameter initiated the inversion process equal to 6 × 10+<sup>17</sup> and 7.6 × 10+<sup>17</sup> for the first and second parts of the BP model, respectively. The initial value

of the *γ* parameter proved to be inadequate and it needed to be adjusted. Consequently, we use a second term, the *μ* parameter to adjust the weight of the regularization term. After several tests, we found that the parameter should be 3.5 × 10−<sup>10</sup> and 3 × 10−<sup>10</sup> for the first and second part of the BP model, respectively. Once the values of the *μ* and *γ* parameters are found, determining the *α* parameter is straightforward. The evolution of the *α* parameter for each model can be seen in Figures 7d and 8d. Note that the *α* values are on the log (natural base) scale. We can see the expected behavior for the weight (*α*) given to the model norm term in Figures 7d and 8d. We observed that there was a sharp drop at the beginning of the inversion. As the model is updated, these terms will decrease and the seismic data will conduct the inversion.

The misfit data curve is shown in Figure 7e for the first part of the model and Figure 8e for the second part of the model. For all of the tests that we performed, we used a small stop criterion to ensure that the data adjustment was as large as possible, which resulted in a large number of iterations. For the stopping criterion, a tolerance limit of 9 × 10−<sup>9</sup> was established in the total value of the model update (total gradient). The misfit data curves show a grea<sup>t</sup> difference between the convergence of conventional FWI (without any type of regularisation or prior information) and FWI with the addition of prior information through the model norm. Even with several iterations, the conventional FWI cannot reduce the misfit data, while the FWI with prior information shows a very sharp drop at the beginning of the iterations that continues until it reaches a relatively satisfactory result.

**Figure 7.** First part of BP model; (**a**) FWI result with adding prior information through the model norm; (**b**) profile in well at position *x* = 2.3 km, (**c**) profile in well at position *x* = 3.5 km and (**d**) dynamic term (*α*) progress, (**e**) misfit data function progress (logarithmic natural base scale).

**Figure 8.** Second part of BP model; (**a**) FWI result with adding prior information through the model norm; (**b**) profile in well at position *x* = 1.5 km (**c**) profile in well at position *x* = 2.3 km, (**d**) dynamic term (*α*) progress (**e**) misfit data function progress (logarithmic natural base scale).

#### *3.2. First Case: Kullback-Leibler's Distance*

As mentioned earlier, our proposal is to add priori information to the FWI formalism through relative entropy. The first attempt to add prior information to FWI through entropy would be with the use of the relative entropy that is described by Equation (4). To use it, it would be necessary to represent entropy as a probability distribution function. This implies a normalization constraint, that is, 0 ≤ *p* ≤ 1 [17]. For the addition of a priori information to be done in a simple and direct way, we will do this in a deterministic way. Thereby, the Equation (4) is not adequate as previously discussed. Equation (4) is not positively defined in any interval. The way to ge<sup>t</sup> around the problem that was brought by Equation (4) was to work with its quadratic form represented by Equation (6). Thus, we started the tests in the first part of the BP model using *α* = 2 × 10<sup>+</sup>7. Even with the addition of prior information, the result converged to a local minimum and it is far from the expected result, as can be seen in Figure 9a. A natural idea would be that the initial weight given to the entropy term is inadequate. Consequently, we increased the value of this initial weight to *α* = 3 × 10+<sup>7</sup> and with few iterations we obtained the result that is illustrated in Figure 9b. In this result, which is still a local minimum, we note that the FWI is leading the solution for the prior model. Although we performed other tests with intermediate values for alpha, we did not achieve the desired success for this case.

**Figure 9.** FWI results with adding prior information through the quadratic form of relative entropy for the first part of BP model. (**a**) initial *α* = 2 × 10<sup>7</sup> (**b**) initial *α* = 3 × 107.

Even with the failure in the first attempt to add the quadratic form of entropy relative to the formalities of the FWI, we performed the test in the second part of the BP model. Figure 10a shows the result. The analysis of the result that is illustrated in Figure 10a shows that in this case the addition of prior information through the quadratic form of the relative entropy enabled the FWI to converge on a satisfactory solution. When comparing the result that was obtained with the model norm (Figure 8a), a visible similarity is observed. A more detailed analysis of the position of the wells confirms this similarity when we compare the adjustment at position *x* = 1.5 km. The model norm (red curve) and the relative entropy (blue curve) both provide equivalent results, as can be seen in Figure 10b. However, when we see the position *x* = 2.3 km (Figure 10c), the adjustment that is provided by the relative entropy proved to be slightly better to the model norm, mainly in the deep part of the model. The initial alpha for this case was *α* ∼ 8 × 10<sup>7</sup> (in this case, *γ* ∼ 2.7 × 10<sup>17</sup> and *μ* = 3 × <sup>10</sup>−10) and its evolution can be seen in Figure 10c. We observe that there is a marked decrease in the weight given to the end of the relative entropy, which allows us to avoid giving too much importance to the entropy term by ensuring that an adequate contribution of the prior information is maintained throughout the iterations.

Figure 10e illustrates the data misfit. As seen in the case of the model norm (green curve), the addition of prior information through relative entropy (light blue curve) causes the FWI to drastically reduce the misfit of the data, leading to a low misfit result. Even though the relative entropy leads to an inversion around iteration 470 through a path of misfit greater than that of the model norm, the values of the final misfit are close.

The failure in the first part and the success in the second part of the BP model led us to conclude that the initial reasoning made through the analysis of the graph that is shown in Figure 2 was correct. In other words, the quadratic form of the relative entropy can somehow increase the problem of local minimums, but, depending on the path that the inversion process takes, we can find the global minimum.

#### *3.3. Second Case: Kullback-Leibler's Distance—Symmetric Form*

There are few applications in the literature for the symmetric form of relative entropy that is shown in Equation (5). Probably the reason for this is the fact that this equation does not have a minimum region, as mentioned in the theoretical section. Therefore, our second proposal is to add the priori information in FWI is through the quadratic form that is represented in Equation (7). Analogously to the previous case, we will also add priori information to the problem directly, without using a probability distribution formalism. First, we performed the tests on the first and second parts of the BP model. The result can be seen in Figures 11a and 12a. We can observe, in both cases, that the addition of prior

information through the use of the quadratic form of the symmetric relative entropy allows the FWI to provide satisfactory results.

**Figure 10.** (**a**) FWI results of adding prior information through the quadratic form of the relative entropy in the second part of BP model; (**b**) velocity profile at position *x* = 1.5 km, (**c**) velocity profile at position *x* = 2.3 km and (**d**) *α* parameter evolution (note this curve is shown in logarithmic natural base scale), and (**e**) misfit data function progress (logarithmic natural base scale).

We compare the result of the FWI (for the first part of the model) using the symmetrical form of the relative entropy (Figure 11a) with that obtained with the model norm in Figure 7a. We observe that the addition of prior information through the symmetric relative entropy provides a better result on the left-hand side of the model. We can confirm the quality of the result by looking at the velocity profiles in the well positions, see Figure 11b,c. We can observe that, for the well at position *x* = 2.3 km (Figure 11b), the result of the FWI with the addition of prior information through the symmetric form of the relative entropy when compared to the use of the model norm provides a better approximation of the real value in the region below the salt. For the well at position *x* = 3.5 km (Figure 11c), the results of the FWI with the symmetric relative entropy and with the model norm provide equivalent adjustments. As for the second parts of the model, the result of the FWI with the symmetric relative entropy (Figures 12a) is visibly equivalent to the result that was obtained with the model norm (Figures 8a). However, when we observe the velocity profiles in the well positions that are shown in Figure 12b,c, we can see that adding the symmetrical case of the relative entropy in the FWI provides a slightly better fit in the deeper part of the model than the model norm.

After several tests, we concluded that the initial value of the *α* parameter for the first part of the BP model should be equal to 1.4 × *e*6 (in this case, *γ* ∼ 2.8 × 10<sup>17</sup> and *μ*= 5 × <sup>10</sup>−10). Its progression can be seen in Figure 11d. For the second part of the BP model, the initial value of *α* parameter was equal to 1.3 × *e*6 (in this case, *γ* ∼ 2.6 × 10<sup>17</sup>

and *μ* = 5 × <sup>10</sup>−10, and we show its evolution in Figure 12d. As previously mentioned, we observed that the *α* parameter provides an adequate balance between the term of the data and the prior information added through the relative entropy. The misfit data curve for the first part of the model is illustrated in Figures 11e. The relative entropy (blue curve) that is added to the FWI provides a result with a better fit than with the model norm (green curve) and conventional FWI (purple curve). For the second part of the model, we can see the misfit data curve in Figure 12e. We observe that the addition of information through the relative entropy (light-blue curve) to the FWI also provides an extensive decay in the misfit data when compared to the classic FWI (purple curve). When comparing the misfit data of FWI with the model norm (green curve), we see that the final adjustment is close, although the misfit with relative entropy is a little better.

**Figure 11.** (**a**) FWI result with adding prior information through the quadratic form of the relative entropy in first part of BP model; (**b**) velocity log at position *x* = 2.3 km; (**c**) velocity log at position *x* = 3.5 km; (**d**) *α* parameter evolution (Note this curve is shown in logarithmic natural base scale); and, (**e**) misfit data function progress (logarithmic natural base scale).

#### *3.4. Third Case: Axiomatic Form*

Finally, our third proposal to add relative entropy in FWI is the Axiomatic form. We also use this form to add priori information to the FWI. Its form is described in the Equation (8). The advantage of the relative entropy described by Equation (8) is that it is positively defined, which makes the application straightforward without the need for any adjustments. Thus, we used Equation (8) and performed the FWI in the first of the BP model, and we show the result in Figure 13a. As in the previous cases, when we compare the result of the FWI with the use of the model norm (Figure 7a) to add the prior information, the relative entropy (8) added to the FWI provides a slightly better quality result on the left-hand side of the model. The analysis of the velocity profiles in the well positions show that the FWI with the addition of the relative entropy (blue curve) as compared to the FWI with the addition of the model norm (red curve) provides an adjustment that is closer to

the desired in the well in the position *x* = 2.3 km (Figure 13b) and a similar adjustment at position *x* = 3.5 km (Figure 13c). We also performed FWI in the second part of BP model, and it can be seen in Figure 14a. In this case, we can see the similarity of the results when we compare this result with that obtained using the model norm (Figure 8a). The analysis of the velocity profiles in the well positions shows an equivalent result in the well position at *x* = 1.5 km (Figure 14b) and, in the position *x* = 2.3 km, we observe a slightly better result of the FWI with the relative entropy (Figure 14c).

**Figure 12.** FWI result with adding prior information through the quadratic form of the relative entropy in second part of BP model; (**b**) velocity log at position *x* = 1.5 km; (**c**) velocity log at position *x* = 2.3 km; (**d**) *α* parameter evolution (Note this curve is shown in logarithmic natural base scale); and, (**e**) misfit data function progress (logarithmic natural base scale).

After several tests, we concluded that, for this case, the weight value of the entropy term should start at *α* = 25 (in this case, *γ* ∼ 2.8 × 10<sup>4</sup> and *μ* = 9 × <sup>10</sup>−4) for the first part of the BP model. The evolution of the *α* parameter can be seen in Figure 13d. For the second part of the BP model, we conclude that the alpha parameter should be *α* = 50 (in this case, *γ* ∼ 5.6 × 10<sup>4</sup> and *μ* = 9 × <sup>10</sup>−4) and its evolution can be seen in Figure 14d. In Figures 13e and 14e, we see the misfit data curve for the first and second parts of the model, respectively. We observe that in the case of the first part of the model, the FWI with the relative entropy (light-blue curve) starts the inversion in a path that is very close to the conventional FWI (purple curve). However, around the iteration 1350, the data adjustment begins to improve significantly, finishing the inversion at a lower misfit data value than the FWI with the model norm (green curve). For the second part of the model, we observe that the FWI with the relative entropy (light-blue curve) starts the inversion in a path that is very close to the FWI with the model norm (green curve). However, at iteration 200, the FWI with relative entropy takes a path of a larger misfit, so that more iterations are needed to obtain a misfit data result that is close to that obtained by the FWI with the model norm.

**Figure 13.** (**a**) FWI result of adding prior information through the relative entropy in the first part of the BP model: (**b**) velocity log at position *x* = 2.3 km; (**c**) velocity log at position *x* = 3.5 km; (**d**) *α* parameter evolution (Note this curve is shown in logarithmic natural base scale); and, (**e**) misfit data function progress (logarithmic natural base scale).

**Figure 14.** (**a**) FWI result with adding prior information through the relative entropy in second part of BP model: (**b**) velocity log at position *x* = 1.5 km; (**c**) velocity log at position *x* = 2.3 km; (**d**) *α* parameter evolution (Note this curve is shown in logarithmic natural base scale); and, (**e**) misfit data function progress (logarithmic natural base scale). 109
