*Article* **Estimating Sound Speed Profile by Combining Satellite Data with In Situ Sea Surface Observations**

**Zhenyi Ou 1, Ke Qu 1,\*, Yafen Wang <sup>2</sup> and Jianbo Zhou <sup>3</sup>**


**\*** Correspondence: quke@gdou.edu.cn

**Abstract:** Given that spatiotemporal measurement of the subsurface profile over a wide range are difficult to obtain, surface observations from satellites are often used to estimate the sound speed profile (SSP). This paper proposes a multisource method based on the self-organizing map (SOM) to improve the estimation of the SSP by merging surface observations with satellite data. Surface observations from the Kuroshio Extension Observatory (KEO) were used to supplement satellite observations (anomalies in the measured sea level and sea surface temperature) to this end. Different combinations of the surface parameters were assessed, their errors were analyzed, and differences between the results before and after the multisource parameters were used are discussed. The proposed method significantly increased the accuracy of estimating the SSP when the parameters obtained from in situ measurements were used, with a root mean square error of 2.18 m/s, less than a third of the error obtained when only satellite observations were used. The proposed method provides a new approach to determining an accurate three-dimensional structure of the sound speed when various surface observations are available.

**Keywords:** Kuroshio Extension Observatory; sound speed profile; self-organizing map

Zhou, J. Estimating Sound Speed Profile by Combining Satellite Data with In Situ Sea Surface Observations. *Electronics* **2022**, *11*, 3271. https://

doi.org/10.3390/electronics11203271 Academic Editor: Arkaitz Zubiaga

**Citation:** Ou, Z.; Qu, K.; Wang, Y.;

Received: 11 September 2022 Accepted: 10 October 2022 Published: 11 October 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

#### **1. Introduction**

The sound speed profile (SSP) plays an important role in civil and military marine applications. As it is a key parameter that determines the characteristics of sound transmission, accurately measuring the SSP is critical for such acoustic applications as localization [1], geoacoustic inversion [2], and tomography [3]. The perturbation-based monitoring of the SSP is the main means of acoustically monitoring the ocean. Dynamic activities in the ocean, ranging from the global climate to internal waves and turbulence, can be observed and analyzed by inverting the SSP [4–6].

Because profile measurements are time-consuming and laborious, it is almost impossible to obtain the real-time three-dimensional (3D) structure of the SSP over a large scale by means of in situ measurements. Researchers have identified a close link between surface parameters and subsurface profiles, and the remote sensing platform with a high spatial and temporal resolution has thus become an important means of obtaining large-scale SSP measurements. Attempts to infer the subsurface profile from satellite observations fall into two categories: "physical" and "statistical" methods. Physical methods take advantage of physical equations between the surface and the subsurface to infer the profile from surface observations. Carnes first discussed the relationship between the sea level (SL) and amplitudes of the empirical orthogonal function (EOF) of the temperature profile [7]. Profile estimation based on the sea surface temperature (SST) and the SL was subsequently found to offer considerable improvements over that based only on the SL, and temperature profiles of the northwest Pacific and northwest Atlantic Oceans were estimated using a single empirical orthogonal function-based regression (sEOF-r) [8]. The sEOF-r method

<sup>1</sup> College of Electronics and Information Engineering, Guangdong Ocean University, Zhanjiang 524088, China

uses an approximate linear equation to describe the physical relationship between parameters of the surface and the subsurface and has been used in such prediction schemes as the operational marine environment of the United States Navy and the Modular Ocean Data Assimilation System (MODAS) [9,10]. Chen confirmed that the sEOF-r method can also be used to directly estimate the global SSP [11]. Although the approximate linear physical relationship inevitably incurs errors owing to the highly nonlinear dynamics of the ocean, research has shown that simple physical expressions can be used to obtain results with reasonably high accuracy. The SST and SL are also effective predictors of the sea surface in the context of profile estimation [12]. Based on big data theory, "statistical" methods describe the relationship between the parameters of the surface and the subsurface through machine learning without preset equations of a specific form. Hjelmervik used the clustering of EOFs and gradient search to estimate the real-time temperature and salinity of the ocean [13,14]. By using the AVISO satellite product as a predictor of the input, Chapman reconstructed velocities by searching for the best-matching class through self-organizing maps (SOM) [15,16]. Su proposed an ensemble learning algorithm that combines extreme gradient boosting and the gradient boosting decision tree to retrieve the profiles of temperature and salinity in the upper 2000 m of the global ocean [17]. Statistical methods can improve the accuracy of profile estimation to a greater extent than physical methods by eliminating the constraints on equation-based inversion.

However, the SST and SL do not include details on the state of the ocean, and this reduces the accuracy of the SSP profile estimated by using them. Based on observations from the central Arabian Sea, Jain found that most errors in reconstructing the SSP occurred at depths of 40–125 m owing to insufficient information about the depth of the mixed layer provided by the SST and SL [18]. Similar errors occurred in SSP estimation in the South China Sea, where the SSP inferred from the SST and SL incurred large errors at a depth close to 500 m due to the exchange of water between the South China Sea and the Pacific Ocean in the Luzhou Strait. This could not be described by remote sensing parameters alone, so multisource parameters were added to strengthen the study of this area [19]. Huang conducted a visual analysis of the practicality of EOF in the South China Sea [20]. Considering that the information available for estimating the subsurface profile is limited, additional predictors are needed to offer richer information to improve the estimation of the SSP. Bao obtained the sea surface salinity (SSS) from both in situ and satellite observations to improve the results of the salinity profile reconstruction [21]. Ou and Chapman estimated SSP using a machine learning method based on SST and SL data [22]. Chen included inverted data from the echo sounder and the depth of the mixed layer in addition to the SST and SL to estimate the SSP [23]. The results indicated that multisource observations can significantly improve the results of estimation, and different predictors make varying contributions to the improvement in estimation.

With advances in the technologies used to observe the global ocean surface, such as voluntary observing ships and moored platforms, the spatiotemporal coverage of in situ surface measurements has significantly increased. These data may help avoid timeconsuming and laborious profile measurements. In this study, we improve estimations of the SSP by combining satellite data with in situ observations of the sea surface. We propose a technique for SSP estimation that uses the SOM with multisource observations to this end. We infer the nonlinear relationships between the multisource surface information and the amplitude of the EOF of the SSP by taking advantage of the topology of a neural network cell. The proposed method can be used for the fused processing of multiple parameters from different types of sensors and provides a tool to evaluate the effects of different predictors of the inversion model. Different models are examined to determine the contributions of different in situ surface parameters. The in situ surface parameters that can provide the value of the gain owing to the inversion scheme are identified and an optimized multi-source model is provided. The results of the SSP estimation based on data from the Kuroshio Extension Observatory (KEO) show that the proposed method can improve the accuracy of the acquired SSP. The multisource method has significant

advantages in dynamic areas of the ocean, such as the Kuroshio Extension region. In the application of real-time and large-scale subsurface profile estimation, the accuracy may be greatly improved by a surface-going vessel, an automated glider, a mooring station or other in situ surface observation equipment.

#### **2. Methodology**

The SOM is used to process multisource information for an inversion problem in order to estimate the SSP by combining satellite data with in situ sea surface observations. The processing flow of the SSP estimation is shown in Figure 1.

**Figure 1.** The flow of SOM-based estimation of the SSP.

Using SST and SL parameters, previous studies have focused on estimating the SSP. Despite the use of excellent machine learning algorithms, it is still difficult to solve the problem of insufficient information in SST and SL. With the development of measurement technology, more and more parameters can be observed. However, the processing method of parameters is not synchronized. It is not possible to process additional parameters and it is not clear how the parameters affect the results.

The EOF is commonly used in SSP modeling to provide a constraint on the inversion problem. The SPP *c*(*z*, *t*), at a sampling depth *z*, and time *t* can be described by [24]

$$c(z,t) = c\_0(z) + \sum\_{s=1}^{\infty} a\_s(t)K\_s(z) \tag{1}$$

where *c*0(*z*) is the background profile, *K* is the EOF, and *a* is the projection coefficient of the EOF. The background profile is the constant part of the SSP that is stable in the long term and can be approximated by the profile of the climatological mean. The superpositions are parts representing perturbations in the SSP. As higher orders of *s* often introduce excessive noise to the samples, the superposition series are commonly truncated without risking a loss of information from the SSP. A threshold of 95%, which is the proportion of variances, is commonly used to determine the number of modes of the EOF used. According to an analysis of the experimental data, three orders of EOFs are used in SSP modeling here.

The EOF can be calculated from a principal component analysis of the space–time samples. The matrix of anomalies in the SSP of the ocean is *<sup>X</sup>* = [*x*1, ··· *xM*] <sup>∈</sup> <sup>R</sup>*Z*×*S*. It is obtained by sampling over *Z* discrete points in terms of depth and *S* instants in time, and by subtracting the background profile from the *S* samples. Based on singular value decomposition, the EOFs can be calculated by

$$XX^T = K\Lambda^2 K^T \tag{2}$$

where the non-zero elements of Λ<sup>2</sup> = *diag*([*λ*<sup>2</sup> <sup>1</sup>, ··· *<sup>λ</sup>*<sup>2</sup> <sup>n</sup>]) <sup>∈</sup> <sup>R</sup>*L*×*<sup>L</sup>* represent the variance along the principal directions defined by the corresponding EOF. The three EOFs with the highest variances are used to reconstruct the SSP. In SSP inversion, the EOF vectors can be obtained by the principal component analysis of the samples. Both the input and the output SSPs are expressed in the form of projection coefficients.

A SOM-based estimation technology is proposed to process multisource information for the SSP. The SOM is a nonlinear vector projection algorithm between the input and output layers. In the input layer, each set of measurement data forms a prototype vector, including remote sensing measurements, location data, in situ measurements, and information on samples of the SSP. To evaluate the contributions of different in situ surface parameters, the in situ measurement parameters are considered optional. In the training process, the prototype vectors follow the probability density of the input layer without changing the topological structure. Reference vectors are assigned through the iterative learning algorithm based on the mean values of different classes of all clusterings of the training data. In the output layer, each neuron unit is represented by a class that contains one reference vector. The input information can be regarded as a fragmentary neuron unit, and the missing part represents the coefficients of EOF of the SSP to be estimated. After training the SOM, the input information is matched with different classes on it according to the Euclidian distance. Charantonis introduced a formula to calculate the Euclidian distance over only dimensions with the available parameters [16]:

$$D\_E^p(X, Y^p) = \sum\_{i \in a} \left( 1 + \sum\_{j \in b} \left( \mathbb{C}\_{i,j}^p \right)^2 \right) \times \left( X\_i - Y\_i^p \right)^2 \tag{3}$$

where *DE* is the Euclidian distance, *X* is the input vector, *Y* is the reference vector, *p* is the index of each class, *a* is the set of input data (available variables), and *b* is the reconstructed output (missing variables) to be solved for. *C* is the correlation matrix between the missing and the available variables. The input information can be regarded as a neuron with missing information, representing the projection coefficients that describe the SSP. The reference vector closest to the input vector can be identified from Equation (3) by using the best-matching class. The missing projection coefficients of the input vector can then be estimated by extracting the corresponding part of the best-matching class and outputting it to reconstruct the SSP.

#### **3. Data**

We used surface data collected from in situ measurements and satellite observations as input parameters for the proposed model. Part of the subsurface measurements of the sound speed profile was used to train the SOM and the rest to validate the accuracy of the estimated SSP. The climatological mean profile was used as the background profile.

All in situ surface measurements and SSP samples used were from the KEO. As a surface mooring, the KEO (32.3◦ N, 144.6◦ E) has a long record of daily real-time measurements in the ocean. The slack-line mooring provides a rich variety of surface and subsurface data, including longwave radiation (LR), shortwave radiation (SR), wind speed (WS), surface temperature (ST), surface salinity (SS), air temperature (AT), surface density (SD), heat content (HC), relative humidity (RH), temperature profile, and salinity profile. The temperature and salinity profiles of the SSP samples can be calculated from Del Grosso's empirical formula [25]. As EOF processing requires samples at the same depth, the SSP sample is considered completed only when its sample depths are shallower than 5 m and deeper than 475 m. And the remaining profiles were cubic interpolated to the nominal depths of sensors on the slack-line (5, 10, 15, 20, 25, 35, 40, 50, 75, 100, 125, 150, 175, 200, 225, 275, 325, 400, 425, 475). For the KEO, such a depth range can retain more perturbation description while losing too many samples due to the exclusion of shallower profiles. A total of 2277 profiles were finally obtained. Of them, 80% (1820 samples from 26 September 2009 to 3 October 2018) formed the training dataset, and the other 20% (457 samples from 4 October 2018 to 5 May 2020) were used as the test set.

The WOA18 was used to describe the background profile. It was obtained from the National Oceanic and Atmospheric Administration. It analyzes in situ measurements from a wide variety of sources and provides the global gridded mean climatological profile. The annual average profile from 2005 to 2017 was used at a spatial resolution of 0.25◦. Figure 2 shows all the SSPs used from the experimental area. Owing to the intense exchange of energy and matter in the Kuroshio Extension, a large perturbation of about 30 m/s in the sound speed occurred on the sea surface and at a depth of 475 m. The large amplitude of the perturbation, with a complex origin, posed a challenge to the SSP estimation.

**Figure 2.** Profiles of training, testing, and the background.

The surface parameters of satellite remoting sensing used here were collected from the Coriolis projects (https://marine.copernicus.eu (accessed on 29 January 2021)). So that they could be distinguished from the surface parameters of the in situ measurements, the remoting sensing parameters consisted of anomalies in the sea level (SLA) and the sea surface temperature (SSTA), and had a spatial resolution of 0.25◦ and a temporal resolution of one day.

#### **4. Results**

Eleven models of the surface parameters derived from the satellite observations and in situ measurements were evaluated, and the results are listed in Table 1 and shown in Figure 3. The precision of the SSP reconstruction was difficult to ensure owing to the complex and intense perturbance in the SSP in the Kuroshio Extension. The mean-variance of the SSP samples was 5.90 m/s, with the appearance of a perturbance of large amplitude in both the sea surface and the thermocline measurements. Model 1 was the classic model for estimating the SSP, and was based only on remote sensing data. It was used as a reference. Model 2–10 evaluated the effects of different sea surface parameters obtained from in situ measurements. If the error in the models examined was smaller than that of Model 1, the in situ parameters were considered to have provided effective information to estimate the SSP and improve the accuracy of the results. Conversely, if the error in the model was larger than that of Model 1, this indicated that the parameters of the in situ measurements had led to redundant neural network topology and the mirage had reduced the accuracy of the results of inversion. As parameters directly affect the sound speed, temperature, salinity, and density were effective input parameters, and the temperature had the most significant effect on the accuracy of SSP estimation. Although air temperature and heat content do not directly reflect the physical properties of seawater, they can reduce error, and show a strong correlation with the SSP. The heat content was a special parameter that led to the greatest improvement in accuracy at a depth of about 50 m from the sea surface. The addition of longwave radiation, shortwave radiation, and wind speed directly increased the error, and so these parameters were considered inappropriate input parameters for SSP estimation. To obtain the best estimation results, all effective in situ predictors and remote sensing predictors were combined in Model 11, which delivered the best results of all parameter combinations considered.



**Figure 3.** Results of SSP estimation for each model.

According to the variation in the estimation error with depth, it is clear that the accuracy of SSP reconstruction was related to the amplitude of perturbations. A large anomaly in the sound speed might have led to a larger error. Because the results were directly derived from the sea surface parameters, accuracy was relatively high near the sea surface even if there was a large deviation in the speed of sound. In the range of depth of 50–150 m below the surface, large errors occurred owing to seasonal and diurnal variations in the mixed layer. The differences between the models gradually decreased with increasing depth, and they delivered nearly the same performance below 250 m. The results of inversion show that the introduction of in situ measurement data improved the accuracy of SSP estimation. This was mainly reflected at depths close to the sea surface. In the case of a large perturbation deeper below the surface, the accuracy did not improve when only the sea surface parameters were considered. To examine the performance of the multisource method in practice, we assessed Model 11 further.

The estimation errors of all samples are shown in Figure 4. Except for a few samples, the multisource model improved the accuracy and robustness of the results. The standard deviation of the estimated SSP was 5.54 m/s and the maximum value was 9.73 m/s. Model 1 had an overall root mean square error (RMSE) of 3.49 with a maximum error of 11.10 m/s. Although the remote sensing parameters were effective predictors as inputs to the inversion model, insufficient information led to many incorrect results. The multisource

model performed better than model 1 in about 80% of the samples. Its RMSE and the maximum error were 2.18 m/s and 6.66 m/s, respectively. In addition, about 82% of the estimated SSPs of the multisource model were within the error limit of 3 m/s, and this value decreased to only 58% without the in situ measurements.

**Figure 4.** Deviations in the data and estimation errors for different samples.

Some parameters might have assisted in the estimation in a special way. Prediction in this case, which might have degraded the results of estimation owing to drastic changes in value, can indicate anomalies in the results of inversion. Almost all failures of estimation corresponded to a large predicted value, which reflected perturbance owing to extreme weather events, such as typhoons and rainstorms. This can be used to build an early warning index for the reliability of the results.

Figure 5 shows the estimated speeds of sound by two models at all depths. Model 1, based only on remote sensing parameters, had an overall mean absolute error of 2.52 m/s, with a value of the coefficient of determination (R2) of 0.76 and a slope of 37.2. The multisource model outperformed model 1, with an absolute error of 1.56 m/s, R2 of 0.90, and slope of 43.9. The speed of sound estimated by it had an error of only 0.1% compared with the measured speed of sound.

**Figure 5.** Scatter in the measured and estimated speeds of sound obtained by different models.

The SSPs estimated by different models (for the first sample in each month in the period considered) are given in Figure 6. Consistent with the previous statistical results, the results of the multisource model were significantly better than those of the model based only on remote sensing parameters. Large errors mainly appeared in parts with large perturbances in the speed of sound, mainly at depths of less than 150 m and greater than 350 m. In the case of perturbations on the sea surface, the accuracy of the estimated SSP significantly improved after the introduction of multisource information, such as in the results for 4 October 2018 and 1 April 2019. In other regions, the perturbance in the sound speed generally decreased with depth. Most errors occurred in the upper thermocline, possibly due to variations in the mixed layer or the dynamics of the ocean, such as internal waves. The multisource method significantly improved the accuracy of the estimated sound speed in the upper ocean. A special characteristic of the KEO is that due to the exchange of energy and matter, perturbations in the deeper thermocline do not decrease to less than in the surface layer. The results show that it was difficult to estimate perturbations far from the surface because all the input parameters were based on data obtained on the sea surface. Although modes of the EOF can express the SSP at all depths as a whole, the principal components of the shape of perturbance had no significant correlation between the surface and deeper layers of the ocean. In addition, different models obtained consistent results on data from 1 November 2019 and 1 April 2020. This indicates that inversion processing and EOF representation cannot be used by themselves to accurately reconstruct the profile of sound speed.

**Figure 6.** Comparison between the measured and the estimated SSPs.

#### **5. Conclusions**

This study proposed a method to estimate the SSP based on the SOM. We used the topological structure of neurons to input multisource observations and estimate the SSP. We used data from the Kuroshio Extension to assess the performance of the proposed model, which uses different surface parameters derived from remote sensing and in situ measurements. The results showed that the addition of parameters of in situ measurements can markedly improve the estimation of SSP. Compared with model 1, the accuracy of the multisource model is improved by 38%. However, not all surface parameters play a

positive role in the inversion. The best results were obtained when remote sensing data and effective predictors were used in the multisource model. Error in the estimated SSP in the region of the Kuroshio Extension was larger than in other parts of the ocean owing to the intense exchange of energy and matter there. The degradation in accuracy revealed that sea surface parameters can be used to accurately estimate the sound speed in the surface layer of the ocean but not in the deep layers. The large amplitude of SSP perturbances poses a challenge to the accurate estimation of the SSP.

Further work is needed to accurately calculate profiles of the SSP as almost all currently available methods are statistical, and require physical rules to obtain reliable estimations. Moreover, methods that can be applied to regions for which few samples are available should be researched.

**Author Contributions:** Conceptualization, K.Q.; methodology, K.Q.; software, Z.O.; validation, J.Z. and Y.W.; formal analysis, Y.W.; investigation, Z.O.; resources, K.Q.; data curation, K.Q.; writing original draft preparation, Z.O. and K.Q.; writing—review and editing, Z.O.; visualization, Y.W.; supervision, J.Z.; project administration, J.Z.; funding acquisition, K.Q. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the Natural Science Foundation of Guangdong Province, grant number (No. 2022A1515011519) and the 2022 Basic Research funds for central universities of Northwestern Polytechnical University.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The data presented in this study are available on request from the corresponding author.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**

