A Prediction Model of Marine Geomagnetic Diurnal Variation Using Machine Learning

Xiong, Pan; Bian, Gang; Liu, Qiang; Jin, Shaohua; Yin, Xiaodong

doi:10.3390/app14114369

Open AccessArticle

A Prediction Model of Marine Geomagnetic Diurnal Variation Using Machine Learning

by

Pan Xiong

,

Gang Bian

^*,

Qiang Liu

,

Shaohua Jin

and

Xiaodong Yin

Department of Military Oceanography and Hydrography and Cartography, Dalian Naval Academy, Dalian 116018, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(11), 4369; https://doi.org/10.3390/app14114369

Submission received: 7 April 2024 / Revised: 9 May 2024 / Accepted: 20 May 2024 / Published: 22 May 2024

(This article belongs to the Special Issue Machine Learning Approaches for Geophysical Data Analysis)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Geomagnetic diurnal variation significantly influences the precision of marine magnetic measurements. Precise estimation of this variation is crucial for enhancing the accuracy of offshore magnetic surveys. To address the challenges in achieving the desired accuracy with current estimation methods for geomagnetic diurnal variation, this study introduces a high-precision estimation model that integrates support vector machine (SVM) and random forest (RF) techniques. Initially, the data preprocessing phase includes an innovative extreme value adjustment method to rectify the temporal discrepancies across different stations, alongside employing the base period technique for daily baseline correction. Subsequently, we construct models to capture the daily variation trends at various times, facilitating an in-depth analysis of the diurnal variation patterns. The culmination of this process involves employing a fusion model algorithm to compute the diurnal variations across all stations comprehensively. Comparative analyses with conventional methods, such as distance weighting, bifactor weighting, and latitude weighting, reveal that our proposed model achieves a significant reduction in the root mean square error (RMSE) by an average of 31%, decreases the mean absolute error (MAE) by 35%, and enhances the Pearson correlation coefficient by 20% on average. These improvements underscore the superior accuracy of our geomagnetic diurnal variation estimation model.

Keywords:

geomagnetic diurnal variation; SVM-RF fusion model; diurnal variation law; extreme value adjustment method; pelagic geomagnetic diurnal variation

1. Introduction

Solar activity disrupts the Earth’s magnetic field, resulting in short-term global changes. Geomagnetic matching navigation, serving as a crucial supplement to inertial navigation, provides exceptional concealment and resilience against interference, positioning it as the primary option in place of GNSS navigation following satellite rejection. Nevertheless, the effectiveness of geomagnetic matching navigation hinges on the availability of high-precision geomagnetic data. Particularly, accurate geomagnetic diurnal variation data play a significant role in enhancing the precision of marine magnetic measurements, underscoring the necessity for precise diurnal data.

In marine magnetic measurement, obtaining the required geomagnetic diurnal variation is crucial. For regions close to the shore, setting up a diurnal variation station on land suffices. However, the challenge arises for areas located far from the shore, where a single station cannot provide the diurnal variation data needed for measurement. To address this issue, a multi-station daily variation observation model based on spatial position relationships is commonly employed. This model proves essential in calculating the daily variation data essential for marine magnetic measurement in distant areas. The complexity of geomagnetic daily variation data poses a significant challenge. Its intricate morphological structure makes accurate acquisition a difficult task. Relying solely on geographical location relationships for projections becomes unreliable in such cases. As a result, specialized methodologies are required to ensure the accurate acquisition of geomagnetic data for marine magnetic measurements in remote offshore locations [1]. The time-varying nature of geomagnetic daily variation is complex [2], and the data between different stations have numerical differences such as phase difference, amplitude difference, and morphological difference [3]. The selection of the base value of the geomagnetic daily variation and the adjustment of the phase difference also affect the accuracy of the geomagnetic daily variation projection, as well as the preprocessing of the data [4].

Walker [5] was the first to use the method of geographic interpolation of latitude and longitude to calculate the geomagnetic diurnal variation correction in the magnetic measurement of the British Isles. Roden [6] found that a single station cannot effectively correct the far-distance measurement area when correcting the daily variation in magnetic measurement observation data. He proposed using the separation reconstruction method and the weighted average method, which have equivalent calculation accuracy, but the weighted average method is faster and more convenient. Gao Shan [7] found that the traditional multi-station daily variation correction was based on geographic coordinates. The geomagnetic coordinate fitting method was utilized to enhance the accuracy of multi-station daily variation correction. Researchers found that geomagnetic coordinates exhibit a stronger correlation with the intensity of the geomagnetic field. Nevertheless, this approach imposes strict demands on the network of daily variation stations. Addressing the limitations of the latitude weighting method, the distance weighting method, and the direct average method, Liu Xiaogang [8] proposed a two-factor weighting method based on direction and distance. This method improved the calculation accuracy of the geomagnetic diurnal variation data by considering the spatial position and geometric distance of the geomagnetic station, alongside an analysis of the relationship between geomagnetic field intensity and latitude. Shan Rujian [9] analyzed the geomagnetic diurnal variation data obtained from three-day synchronous measurements at the Shahe, Zhengding, Shangqiu, and Dongying stations. Three fitting methods for modeling geomagnetic diurnal variation in local areas were proposed: two-dimensional polynomial least squares, space–time fitting, and linear interpolation. Encouragingly, these methods have shown positive results, effectively addressing cases with a 6° latitude difference and a 5° longitude difference. In the context of an aeromagnetic survey in the central and western Qinghai-Tibet Plateau, Guo Jianhua [10] applied the function fitting method and the weighted average method to correct the daily variation. The latitude distance weighted average method stands out as an effective approach to enhancing the accuracy of daily variation correction across multiple stations. It has been found that this method improves the effectiveness of the weighted average method when calculating multi-station daily change correction values. However, it is important to note that implementing this method requires a well-structured daily change station network due to the higher demands it places on the graphical organization [11]. Xuhang [12] investigated the impact of distance change on the accuracy of offshore magnetic survey correction when correcting the diurnal variation in the southeastern sea area of Xisha. In the study of the spatial and temporal characteristics of the deep-sea geomagnetic field, the global static day model was employed to calculate the geomagnetic diurnal variation. However, it was found that the calculation accuracy of this model was low, posing challenges for its utilization in high-precision geomagnetic survey applications.

Related Work

Aiming at the problems of the existing methods, such as the high requirements on the spatial location of the geomagnetic diurnal variation stations, poor and complicated calculation of magnetic disturbance diurnal variation, and strict requirements on a priori information, this paper proposes a geomagnetic diurnal variation projection model based on SVM fusion RF by processing and analyzing the data from the geomagnetic stations, and proposes to use the polar deviation method to correct the time difference of the data from different stations, and to use the base time period method for the base value of diurnal variation, and to use the preprocessed diurnal variation data as the training data of the SVM-RF model to project the main station. The preprocessed daily variation data are used as the training data for the SVM-RF model to derive the geomagnetic daily variation values of the main stations, and the method is validated using the data from the European regional geomagnetic stations in this paper. The results show that the proposed method has higher accuracy in the derivation of the geomagnetic day variation, requires less conditions for the training stations, adapts to different combinations of station structures, and is more robust to geomagnetic perturbation changes.

2. Methodology

2.1. Principle of the SVM Algorithm

SVM is a class of supervised machine learning algorithms that excels at creating optimal planes for regression and prediction in high-dimensional spaces [13]. SVM has gained considerable interest and shown remarkable performance in different fields owing to its robust generalization ability, resilience, and the wide array of kernel functions it accommodates. Particularly effective in high-dimensional data scenarios, SVM offers faster and more stable prediction speeds than neural networks, which are commonly employed in small-sample classification. Moreover, SVM places lesser demands on computer resources and does not necessitate extensive training samples to attain satisfactory outcomes.

SVM to create the optimal plane can be expressed as follows:

As shown in Figure 1, the SVM model generates a hyperplane to separate data points with the objective of balancing the preservation of a wide margin and the minimization of errors. The width of this margin is controlled by the hyperparameter C. When C is larger, as depicted in the left graph, the error rate decreases while the margin narrows, ultimately diminishing the model’s ability to generalize to new data. On the other hand, a smaller value of C, as seen in the right graph, yields a wider margin, enhancing the model’s capability to generalize to unseen data [14].

The choice of the kernel function of the SVM algorithm depends on the characteristics and nonlinear relationship of the problem. The commonly used kernel functions are as follows:

K (x, y) = x \times y

(1)

K (x, y) = {(x \times y + c)}^{d}

(2)

K (x, y) = e^{(- \frac{{| | x - y | |}^{2}}{2 σ^{2}})}

(3)

K (x, y) = \tanh (α x \times y + β)

(4)

Equation (1) represents a linear kernel function, Equation (2) represents a polynomial kernel function, Equation (3) represents a Gaussian radial basis kernel function, and Equation (4) represents a Sigmoid kernel function. In the aforementioned equations, (x) and (y) denote input vectors;

α

and

β

are adjustable parameters, which can control the shape of the hyperplane and affect the performance of SVM. c represents the constant term, which determines the offset of the kernel function; d is the degree of polynomial, which determines the high-dimensional space complexity of the polynomial kernel function after mapping.

σ

is a constant; it determines the shape of the Gaussian kernel function in high-dimensional space and affects the smoothness of the kernel function and the selection of the decision boundary [14].

The Gaussian kernel function is chosen for modeling in this study for its effectiveness with both large and small sample sizes, as well as its ability to efficiently reduce noise in daily data.

2.2. Principle of the RF Algorithm

The random forest (RF) algorithm, a type of supervised machine learning method, is primarily utilized for addressing classification and regression challenges [15]. Apart from sharing similar advantages with SVM, RF demonstrates robustness in maintaining high accuracy even in scenarios where the training set contains missing data. RF operates as an ensemble algorithm comprising multiple decision tree classifiers. It constructs a more efficient model for classification and regression by leveraging randomly generated decision trees. Due to its nature as a randomly selected sampling mechanism, RF exhibits superior generalization capabilities. The algorithm proceeds as follows:

Generate k independent sampling sets by randomly selecting from the original sample set;
Train the k sampling sets to create k weak learners;
Combine the k weak learners through aggregation to create a strong learner output, which functions as the final model.

The flowchart illustrating the specific implementation steps is presented below:

Figure 2 shows the flowchart of the random forest (RF) algorithm, which typically uses the simple voting method for classification tasks and the simple average method for aggregation in regression problems.

2.3. Fusion Model

Ensemble learning is widely popular for its capability to merge multiple weak learners, using sample and model weighting to achieve better generalization than single learning models [16]. In this study, daily variation data from multiple geomagnetic stations are learned by employing both SVM and RF algorithm models. To enhance the overall performance of the final learning model, the combination of multiple models requires a certain level of accuracy for each individual model while also leveraging their differences. Rather than a simple superposition, the models are integrated by establishing an error threshold through the comparison of errors on the validation set. When the error difference among different models falls below the threshold, the models are merged by assigning weights. Subsequently, the final calculated value is adjusted according to these assigned weights.

{M A E}^{S V M} = | y^{S V M} - y_{t r u e} |

(5)

M A E^{R F} = | y^{R F} - y_{t r u e} |

(6)

W^{S V M} = \frac{M A E^{R F}}{M A E^{S V M} + M A E^{R F}}

(7)

W^{R F} = \frac{M A E^{S V M}}{M A E^{S V M} + M A E^{R F}}

(8)

In Equations (5)–(8), where

y^{S V M}

is the calculated value,

y^{R F}

is the real value of the SVM model,

y_{t r u e}

is the RF model on the validation set,

M A E^{S V M}

and

M A E^{R F}

represent the calculation error of the model, and

W^{S V M}

and

W^{R F}

represent the combined weight.

Enhancing the algorithm’s generalization ability is crucial. In cases where the training data yield suboptimal results for SVM and RF, a threshold of

θ

is established to determine the necessity of model combination. If the relative difference between the two training models exceeds this threshold, the combination process is omitted. The decision-making approach is outlined as follows:

\frac{| M A E^{S V M} - M A E^{R F} |}{\max (M A E^{S V M}, M A E^{R F})} > θ

(9)

The algorithm flowchart is as follows:

The SVM-RF algorithm in Figure 3 partitions the dataset into a training set, a validation set, and a test set as an initial step. The SVM model and the RF model are then trained individually on the training set, leading to the development of two separate models. Following this, the discrepancy between the predicted values and the actual values from both models on the validation set is calculated and compared to a predetermined threshold. If the disparity is lower than the specified threshold, the models are combined by assigning weights determined using Equations (7) and (8).

2.4. Comparison Algorithm Model

To validate the efficacy of the SVM-RF fusion model, we conducted a comparative analysis of the distance weighting method, the two-factor weighting method, and the latitude difference weighting method. The mathematical model for the distance weighting method is presented as follows [17]:

d_{i} = \sqrt{{(x - x_{i})}^{2} - {(y - y_{i})}^{2}}

(10)

T (x_{o}, y_{o}) = \sum_{i = 1}^{m} \frac{f (d_{i}) T_{i}}{d_{i}}

(11)

f (d_{i}) = \frac{1}{{(d_{i}^{2} + ε)}^{4}}

(12)

In Equations (10)–(12), where

d_{i}

represents the distance between the training station and the master station,

T (x_{o}, y_{o})

denotes the daily variation at point

(x_{o}, y_{o})

, and

ε

is a minute value introduced to avoid a zero denominator.

The mathematical model for the two-factor weighting method is outlined as follows:

M = \frac{\sum_{i = 1}^{n} [{(\frac{1}{| B_{i} |} + \frac{1}{| L_{i} |})}^{k} \times T_{i}]}{\sum_{i = 1}^{n} {(\frac{1}{| B_{i} |} + \frac{1}{| L_{i} |})}^{k}}, k \geq 0

(13)

The mathematical model of the latitude difference weighted average method is as follows:

M = \frac{\sum_{i = 1}^{n} [(\frac{1}{| B_{i} |}) \times T_{i}]}{\sum_{i = 1}^{n} (\frac{1}{| B_{i} |^{2}})}, k \geq 0

(14)

In Equations (13) and (14),

T_{i}

represents the observation data of known geomagnetic stations,

n

represents the total number of stations,

M

denotes the calculated value,

B_{i}

represents the difference in latitude, and

L_{i}

represents the difference in longitude.

3. Overview of the Study Area and Data Processing

3.1. Research Area Introduction

For the experimental analysis in this study, data from five geomagnetic stations in the European region—BEL, ESK, FUR, HAD, and HLP—between 10 November and 27 December 2018 were sourced from the INTERMAGNET website. The daily variation data from these selected stations were utilized. The geographical distribution of these stations is illustrated in Figure 4.

The impact of the station network structure on the prediction of the SVM-RF algorithm was further analyzed based on Figure 4, where it is evident that the station distribution is relatively dispersed. This dispersion enables the construction of diverse station network structures according to the geographical locations of multiple stations. Additionally, Table 1 presents the basic information of each station along with the inter-station distances for reference.

The data in Table 1 show that the altitude of each station is relatively low, ranging from 335 km to 3090 km. Additionally, the maximum longitude difference among stations is 25.273°, and the maximum latitude difference is 7.149°.

3.2. Data Preprocessing

In order to ensure the integrity of the experiment and mitigate the impact of outliers and missing data in the geomagnetic station dataset, the Grubbs outlier detection method is first employed for detection and removal. Jump points and missing values are subsequently addressed through conformal cubic interpolation. When applying the SVM-RF model for estimating daily variations, a specific geomagnetic station is selected as the main station, with the remaining stations utilized as training stations for model training.

In this study, the daily variation base value is determined using the base period method. Specifically, the mean value derived from geomagnetic data collected during 7 h intervals from 21:00 to 24:00 and from 00:00 to 04:00 is selected as the daily variation base value. The daily variation base value signifies the stable magnetic field value at a station when unaffected by external magnetic field influences. Subtracting the daily variation base value from the observed geomagnetic station value yields the daily variation data [18].

In order to enhance the correlation of training data and mitigate the impact of time differences between the main station and training stations, it is essential to adjust the training station data to align with the location of the main station. Given the intricate nature of geomagnetic diurnal variation, this study proposes the use of an extreme value adjustment method to address the time difference. The extreme value adjustment method aligns the daily variation data of training stations with the main station by averaging the differences between the extreme value times of the main station and other training stations’ local daily variation data. The selection of the time window should capture the representative extreme values of the time difference, particularly when the curve exhibits concave–convex property changes, as shown in the local area corresponding to

Δ T_{1}

and

Δ T_{2}

in Figure 5. The determination of the number of time windows is contingent upon the actual circumstances, typically exhibiting a positive correlation with the data’s time span.

Δ T = \frac{Δ T_{1} + Δ T_{2} + \dots Δ T_{n}}{n}

(15)

Equation (15) represents the calculation equation for the correction time in the extreme value adjustment method, where

Δ T_{1}

is the time difference of the first local area, ‘n’ is the number of selected local areas, and

Δ T

is the mean value of each time window as the final correction value. The schematic diagram illustrating the extreme value adjustment method is presented in Figure 5:

The daily variation observation data from the ESK and FUR stations in Europe for the period of 15 to 17 November 2018 are depicted in Figure 5. Equation (6) is applied to calculate the time difference correction value using three selected time windows for adjustment. Figure 5a illustrates the sample data without time difference correction, while Figure 5b displays the data after extreme value adjustment correction. The main station is designated as station A, and station B is selected for time difference correction. After computation, the time differences between the two local regional extreme values are found to be

Δ T_{1}

,

Δ T_{2}

, and

Δ T_{3}

. The mean value of these time differences is then utilized as the time difference correction value for station B. The corrected image is displayed on the right, and the Pearson correlation coefficient is determined to be 0.93.

Δ T = \frac{L - L_{i}}{15} \times 60

(16)

Equation (7) is used to correct the time difference by incorporating the longitude difference. In this formula,

L

represents the longitude of the main station, while

L_{i}

denotes the longitude of training station

i

.

The experimental analysis indicates that the extreme value adjustment method performs better than the longitude difference time difference correction method in terms of accuracy, fitting effect, and correlation strength. In Figure 6, the illustration shows the time difference correction of station B using the longitude difference correction method, revealing an offset at the extreme point that results in an overall poor fitting effect. The calculation of the Pearson correlation coefficient yields a value of 0.89.

The daily variation data for the stations, processed according to the steps outlined in Figure 7, from 10 November to 27 December 2018 are presented in Figure 8:

4. Analysis of Daily Variation Law

Solar activity, which generates a substantial amount of high-energy particles interacting with the Earth’s outer space, is closely linked to the diurnal variation of the geomagnetic field. Consequently, these interactions lead to the movement of charged particles within the Earth’s magnetosphere, giving rise to current systems in outer space. As a result of these processes, the disturbed magnetic field appears as short-term fluctuations that are observed on the ground [19].

As depicted in Figure 9, the Earth is impacted by a significant influx of high-energy particles generated by solar activity. Owing to the presence of the Earth’s magnetic field, the trajectories of these particles are bent and diverted away from the Earth [20]. The particles interacting with the Earth’s magnetic field cause short-term effects. Solar activity’s complexity and irregularity lead to the intricate and variable diurnal variations in the geomagnetic field.

Figure 10 provides a visual representation of the geomagnetic diurnal variation model across a localized area in Europe on 10 November 2018, at nine specific time points. The white line in the figure corresponds to the prevailing trend line at that particular moment, while the dotted line illustrates the vertical gradient shift within the area under consideration. The complex and variable nature of the alteration trends throughout the day is evident in Figure 10. These fluctuations typically manifest along either the latitude or the longitude axis, although no consistent pattern emerges. Moreover, there is a noticeable inclination between the latitude and longitude directions at various time points. For example, in Figure 10a, the gradient shift is observed toward the northwest. By 8 o’clock (Figure 10c), the gradient predominantly shifts to the northeast, with a minor segment on the right side indicating a northwestward trend, suggesting a gradual evolution in the gradient’s orientation. At noon (Figure 10d), there is a shift in the gradient direction toward the northwest, followed by a transition to the northeast between 12 o’clock and 14 o’clock. Subsequently, from 16:00 to 20:00, the gradient direction reverts back to the northwest before changing again to the northeast by midnight. This continuous shifting of the geomagnetic diurnal variation pattern over the course of a day underscores a nonlinear trend that deviates from a strictly latitude- or longitude-based orientation.

Introducing machine learning to model the geomagnetic diurnal variation at each site is a valuable approach due to the dynamic and changing relationship between various geomagnetic observation sites over time. A single fixed function model may not accurately capture the relative relationships of geomagnetic diurnal variations at different sites because of the complexity and variability of the Earth’s magnetic field behavior. By leveraging machine learning techniques, it becomes possible to create more adaptable and precise models that can account for the evolving nature of geomagnetic data. This approach provides better insights into the diurnal variations at each specific site.

5. Experimental Result Analysis

5.1. Example Analysis

To calculate diurnal variation using the SVM-RF model, a main station is selected while the remaining stations are designated as training stations. Daily variation data captured between 10 November and 19 December 2018 from all stations are used as training data to feed into the SVM-RF model. By doing so, the model is able to grasp the daily variation pattern of the region through the training data. Subsequently, data from all training stations recorded between 19 and 27 December are input into the model to compute the daily variation of the main station. This methodology allows for the extraction of meaningful insights into the diurnal variation pattern of the main station, which is inferred based on the knowledge acquired from the training data. The experimental environment of this paper is configured as follows: the CPU is AMD Ryzen 7 4800H, the GPU is NVIDIA GeForce RTX 2060, and the development language is MATLAB.

To assess the algorithm’s applicability, the main stations chosen were BEL, HAD, ESK, FUR, and HLP, while the remaining stations were used as training stations and modeled employing SVM-RF models. Error analysis comparing the calculated values with the true values of the model was conducted to assess the accuracy of the root mean square error and the mean absolute error analysis algorithm (see Figure 11, Figure 12, Figure 13, Figure 14 and Figure 15). For the algorithm’s reliability verification, comparisons were made with the distance weighting method, the two-factor weighting method, and the latitude difference weighting method. This comprehensive analysis aims to provide insights into the performance and applicability of the SVM-RF model when compared to other established methods, thus helping to validate its efficacy in modeling geomagnetic diurnal variation.

Essential for evaluating the accuracy and performance of algorithms in modeling and predicting geomagnetic diurnal variations at the selected stations are the metrics presented in Table 2, Table 3, Table 4, Table 5 and Table 6. Specifically, RMSE (root mean square error) represents the root mean square error, MAE (mean absolute error) signifies the mean absolute error, and Correlation indicates the Pearson correlation coefficient. These metrics play a crucial role in assessing the effectiveness of the algorithms used in this study.

The results highlight the superior performance of the SVM-RF fusion model compared to the other three algorithms, as evidenced by reductions in the root mean square error (RMSE) and mean absolute error (MAE). The RMSE decreased by 39%, 27%, 9%, 35%, and 45% under different station selection schemes, while the MAE decreased by 48%, 26%, 13%, 41%, and 46%, correspondingly. Furthermore, the Pearson correlation coefficients saw notable increases of 16%, 23%, 3%, 24%, and 34%, respectively. Importantly, all SVM-RF models with different station selection methods achieved Pearson correlation coefficients exceeding 80%, collectively showing an average reduction of 31% in RMSE and 35% in MAE. The model in this paper has 94.59% of the predicted values lying within the confidence interval at a confidence level of 95%.

Utilizing BEL and HLP as the primary stations yields Pearson correlation coefficients exceeding 90% between the calculated outcomes and the actual values. Among these, the calculation accuracy is highest when HLP is designated as the main station, with a root mean square error of only 1.053 and an average absolute error of merely 0.77 nT. In comparison with the other three algorithms, there is a 21% average increase in the Pearson correlation coefficient, effectively enhancing the accuracy of geomagnetic diurnal variation calculations. This enhancement offers valuable practical insights for marine magnetic measurements by providing dependable geomagnetic diurnal variation data.

As depicted in Figure 16, the radar map in (b) illustrates the error distributions of various models, whereas (a) shows the distribution of absolute errors for each daily variation prediction model. The SVM-RF daily variation prediction model introduced in this study demonstrates smaller absolute errors, primarily concentrated toward the lower end of the scale. Specifically, the error distribution of the SVM-RF model is centered within the inner regions of all models, suggesting superior calculation performance and accuracy.

5.2. Model Applicability Analysis

In the case study, the SVM-RF algorithm shows superiority over the other three algorithms in daily variation calculations across all station selection schemes. However, the degree of accuracy improvement differs across different stations. For example, designating BEL as the primary station results in a 46% decrease in MAE, whereas for the FUR station, the MAE reduction achieved by the SVM-RF model is just 13%. Hence, an analysis of the applicability of SVM-RF is conducted to investigate its performance discrepancies across various stations.

When HLP and BEL are designated as the primary stations, Table 2, Table 3, Table 4, Table 5 and Table 6 show higher calculation accuracy and Pearson correlation coefficients. Conversely, with FUR as the main station, the improvement effect is not as pronounced compared to the other calculation methods. The geographical location of FUR, centrally among all stations as depicted in Figure 4, suggests the inferior performance of other algorithms in inward estimation. In contrast, the SVM-RF model shows less apparent advantages. At edge sites like BEL and HLP, other algorithms exhibit poorer estimation accuracy, while the SVM-RF algorithm displays greater accuracy improvement and advantages. This observation implies that the SVM-RF model has a broader applicability, imposing lower requirements on the network structure of selected stations. Ultimately, it maintains high accuracy in both inward and outward calculations.

6. Discussion

The traditional calculation method relies on establishing the geomagnetic diurnal variation model based on spatial position relationships, for example, the distance weighting method is based on the mutual distance of geomagnetic heliostat stations for modeling and projection, the latitude difference weighting is to model the latitude difference of different stations, and the dual weighting factor weighting method integrates the latitude difference and distance for modeling. However, after modeling the geomagnetic diurnal variation trend within a day in Europe, it becomes evident that diurnal variation data exhibit diverse rules across different regions and times, characterized by time-varying complexity. Given the global diversity in diurnal variation patterns, this paper advocates for incorporating short-term prior daily variation information specific to the survey area, along with data from distant stations, to conduct in-depth analysis and unveil underlying relationships. By considering the actual circumstances, a calculation model aligned with the regional geomagnetic diurnal variation law is formulated. This approach imposes fewer restrictions on the positional relationships between stations and aims to capture the true diurnal variation patterns of the survey area more accurately. The data of the traditional algorithm are projected from the daily variation data collected from the closer stations, and it is difficult to reflect the geomagnetic daily variation characteristics by only considering the latitude difference, longitude difference, and mutual distance when calculating, and it is more demanding to select the station location relationship.

Establishing a long-term offshore diurnal variation station in order to correct diurnal variations in offshore magnetic measurements poses challenges, primarily due to maintenance difficulties and the risk of data loss across the entire measurement section, which often leads to unreliable diurnal variation data quality. One approach to address this issue is to utilize high-quality short-term diurnal variation data collected within the survey area as prior information. By setting up a short-term offshore diurnal variation station, it becomes feasible to obtain short-term diurnal variation information for the survey area in conjunction with data from distant shore geomagnetic stations. In this study, a methodology is proposed to develop a geomagnetic diurnal variation calculation model that captures the true characteristics of both the survey area and the shore geomagnetic stations. Experimental simulations of far-sea magnetic measurement scenarios are then conducted based on the distribution of terrestrial geomagnetic stations. These simulations designate one station as the main station, assumed to be located at sea, with the other stations serving as onshore geomagnetic stations for model verification. The results of the experiments indicate that the model can accurately calculate daily variation information for the actual measurement area over an extended period with high precision. This innovative approach demonstrates the efficiency of leveraging short-term data from the survey area and long-distance shore geomagnetic stations to enhance diurnal variation correction in offshore magnetic measurements. Furthermore, this approach presents lower requirements for geomagnetic day-variable station locations, the ability to adapt to different combined station structures, and improved immunity to the extrapolation of geomagnetic perturbation variations.

7. Conclusions

This paper models the trend of geomagnetic diurnal variation within a local area of Europe, analyzes the method of time difference correction at different sites, and compares the impact of the SVM-RF model on geomagnetic diurnal variation. The following conclusions are drawn:

The extreme value adjustment method suggested in this study surpasses the longitude difference correction method in reducing the effects of diurnal time differences. This approach results in increased Pearson correlation coefficients and enhanced accuracy in fitting diurnal curves.
To more accurately represent the complex and time-varying nature of the geomagnetic diurnal variation trend within a local area, it is noted that a mathematical model relying solely on latitude, longitude, and distance is inadequate. Therefore, to better capture the intricate changes in the real geomagnetic diurnal variation trend in the survey area, incorporating short-term prior data is deemed essential. This approach is necessary for constructing a more region-specific geomagnetic diurnal variation model.
Compared to distance weighting, two-factor weighting, and latitude weighting methods, the SVM-RF model shows significant improvement in accurately estimating geomagnetic diurnal variation. The SVM-RF model, on average, reduces root mean square error (RMSE) by 31%, mean absolute error (MAE) by 35%, and increases the Pearson correlation coefficient by 20%. This model’s notable advantage is eliminating the need for data smoothing, thereby allowing for a more precise representation of magnetic disturbance variations.

Future research should aim to reduce the algorithm’s reliance on a priori information and enhance its capability to extrapolate in regions with limited a priori information availability.

Author Contributions

P.X. and Q.L.: formal analysis, data curation, and writing—original draft; G.B. and S.J.: supervision, writing—review and editing, and funding acquisition; X.Y. and P.X.: data collection, analysis, and visualization. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original data of this study are obtained from the INTERMAGNET Programme (http://www.intermagnet.org) accessed on 23 November 2023.

Acknowledgments

The results presented in this paper rely on data collected at magnetic observatories. We thank the national institutes that support them and INTERMAGNET for promoting high standards of magnetic observatory practice (http://www.intermagnet.org, accessed on 23 November 2023).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Gou, Y.F.; Chai, H.Z.; Chong, Y.; Wang, J.H.; Su, C.H. Analysis and correction of geomagnetic diurnal variation based on Gaussian function. J. Yunnan Univ. Nat. Sci. Ed. 2024, 46, 59–66. [Google Scholar]
Ren, X.Y.; Fu, Y.T.; Zhou, Z.G. Influence of magnetic crochet and magnetic bay on ocean magnetic data. Mar. Sci. 2023, 47, 12–19. [Google Scholar]
Li, X.K.; Fu, Y.T.; Zhou, Z.G.; Yang, A. Differences in the characteristics of geomagnetic diurnal variation in different periods and their influence on correction of geomagnetic diurnal variation. Geophys. Geochem. Explor. 2023, 47, 135–145. [Google Scholar]
Hu, Z.Q.; Jin, S.H.; Bian, G.; Liu, Q.; Zhou, Q. The influence of the selection of geomagnetic diurnal variation base value on the accuracy and mapping of magnetic survey. Hydroaphic Surv. Charting 2022, 42, 25–29. [Google Scholar]
Walker, G.W. The Magnetic Re-Survey of the British Isles for the Epoch January 1, 1915. Philos. Trans. R. Soc. Lond. 1997, 219, 1–72. [Google Scholar]
Roden, R.B.; Mason, C.S. The Correction of Shipboard Magnetic Observations. Geophys. J. Int. 1964, 9, 9–13. [Google Scholar] [CrossRef]
Gao, S.; Wang, W.B. Diurnal variation correction method for multiple geomagnetic stations based on geomagnetic coordinates. Geophys. Geochem. Explor. 2023, 46, 1518–1522. [Google Scholar]
Liu, X.G.; Xun, J.L.; Guan, B.; Ma, J.; Duan, W.C. Bifactor weight determination method based on direction and distance in geomagnetic data assimilation. J. Beijing Univ. Aeronaut. Astronaut. 2020, 46, 29–37. [Google Scholar]
Shan, R.J.; Jin, G.; Zhen, Z.C. Study on geomagnetic diurnal variation and its fitting methods in local area. J. Jilin Univ. (Earth Sci. Ed.) 1990, 20, 315–322. [Google Scholar]
Guo, J.H.; Xue, D.J. Research and application of multi-station magnetic diurnal variation correction method. In Proceedings of the Fourth National Symposium of Young Geologists, Beijing, China, 28 October 1999. [Google Scholar]
Bian, G.; Liu, Y.C.; Bian, G.-L.; Yu, B. Research on computation method of multi-station diurnal variation correction in marine magnetic surveys. Chin. J. Geophys. 2009, 52, 2613–2618. (In Chinese) [Google Scholar]
Xu, X.; Zhao, X.D.; Wang, G.X.; Liao, K.-X.; Liu, Y.-Q.; Yang, Z.-H. Analysis of data from the deep-sea geomagnetic observation buoy in the southwest subbasin of the South China Sea. Chin. J. Geophys. 2017, 60, 1179–1188. [Google Scholar]
Pande, C.B.; Kushwaha, N.L.; Orimoloye, I.R.; Kumar, R.; Abdo, H.G.; Tolche, A.D.; Elbeltagi, A. Comparative Assessment of Improved SVM Method under Different Kernel Functions for Predicting Multi-Scale Drought Index. Water Resour. Manag. 2023, 37, 1367–1399. [Google Scholar] [CrossRef]
Chauhan, V.K.; Dahiya, K.; Sharma, A. Problem Formulations and Solvers in Linear SVM: A Review. Artif. Intell. Rev. 2019, 52, 803–855. [Google Scholar] [CrossRef]
Speiser, J.L.; Miller, M.E.; Tooze, J.; Ip, E. A Comparison of Random Forest Variable Selection Methods for Classification Prediction Modeling. Expert Syst. Appl. 2019, 134, 93–101. [Google Scholar] [CrossRef] [PubMed]
Wei, A.; Chen, Y.; Li, D.; Zhang, X.; Wu, T.; Li, H. Prediction of Groundwater Level Using the Hybrid Model Combining Wavelet Transform and Machine Learning Algorithms. Earth Sci. Inform. 2022, 15, 1951–1962. [Google Scholar] [CrossRef]
Pan, X.C.; Yao, C.L.; Zheng, Y.M.; Shi, L.; Zhang, S. Study on latitude correction method of diurnal variation correction for marine magnetic survey. Chin. J. Geophys. 2020, 63, 3025–3036. (In Chinese) [Google Scholar]
Bian, G.; Liu, Y.C.; Zhai, G.J. A method to determine the correction datum of the geomagnetic diurnal variation. Hydroaphic Surv. Charting 2003, 23, 9–11. (In Chinese) [Google Scholar]
Richardson, I.G. What in the Solar Wind Does the Earth React To? AIP Conf. Proc. 2013, 1539, 370–375. [Google Scholar]
Danilov, A.A.; Krymskii, G.F.; Makarov, G.A. Geomagnetic Activity as a Reflection of Processes in the Magnetospheric Tail: 1. The Source of Diurnal and Semiannual Variations in Geomagnetic Activity. Geomagn. Aeron. 2013, 53, 441–447. [Google Scholar] [CrossRef]

Figure 1. SVM generates the optimal hyperplane.

Figure 2. Flowchart illustrating the random forest algorithm process.

Figure 3. Flowchart depicting the integrated SVM-RF hybrid algorithm process.

Figure 4. Map depicting the distribution of geomagnetic station locations.

Figure 5. Schematic diagram illustrating the extreme value adjustment method, (a) shows the raw diurnal variation data from two geomagnetic stations, and (b) is the geomagnetic diurnal variation data after poleward adjustment.

Figure 6. Correction of longitude and time differences.

Figure 7. Flowchart of data processing.

Figure 8. Preprocessed daily variation data for each station.

Figure 9. Diagram illustrating the Sun–Earth electromagnetic effects.

Figure 10. Trend of diurnal variation in the geomagnetic field within a local area of Europe, (a–i) show the variation of the geomagnetic daily variation in the region at different moments of the day.

Figure 11. Analysis chart showing the calculation results of diurnal geomagnetic variation at the HLP station, (a) is a plot of the imputed values versus the true values of the model proposed in this paper, (b) a box-and-line plot of the error distribution of the different algorithms.

Figure 12. Analysis chart showing the calculation results of diurnal geomagnetic variation at the HAD station, (a) is a plot of the imputed values versus the true values of the model proposed in this paper, (b) a box-and-line plot of the error distribution of the different algorithms.

Figure 13. Analysis chart showing the calculation results of diurnal geomagnetic variation at the FUR station, (a) is a plot of the imputed values versus the true values of the model proposed in this paper, (b) a box-and-line plot of the error distribution of the different algorithms.

Figure 14. Analysis chart showing the calculation results of diurnal geomagnetic variation at the ESK station, (a) is a plot of the imputed values versus the true values of the model proposed in this paper, (b) a box-and-line plot of the error distribution of the different algorithms.

Figure 15. Analysis chart showing the calculation results of diurnal geomagnetic variation at the BEL station, (a) is a plot of the imputed values versus the true values of the model proposed in this paper, (b) a box-and-line plot of the error distribution of the different algorithms.

Figure 16. The distribution of calculation errors for each method, (a) a bar chart of the error distribution of different algorithms, (b) a radar chart of the error distribution of different algorithms.

Table 1. Information of each station.

IAGA Code	Latitude (°N)	Longitude (°E)	Altitude (m)	Station	BEL	ESK	FUR	HAD	HLP
IAGA Code	Latitude (°N)	Longitude (°E)	Altitude (m)	Station	Distance (km)
BEL	51.836	20.789	180	Belsk	0	1622	792	3090	335
ESK	55.314	356.794	245	Eskdalemuir	1622	0	1271	1491	1402
FUR	48.165	11.277	572	Furstenfeldenbruck	792	1271	0	2572	885
HAD	50.995	355.516	95	Hartland	3090	1491	2572	0	2891
HLP	54.603	18.811	1	Hel Observatory	335	1402	885	2891	0

Table 2. Comparison and analysis table of diurnal geomagnetic variation calculation results at the HLP station.

HLP	SVM-RF			Distance Weighted	Latitude Difference Weighting	Two-Factor Weighting	Average Increase in Effect
	Combination Weight	SVM	0.5022
	Combination Weight	RF	0.4978
RMSE/nT	1.0530			1.2160	2.1599	1.7967	39%
MAE/nT	0.7701			0.9070	1.7133	1.4119	48%
Correlation	0.91			0.82	0.73	0.81	16%

Table 3. Comparison and analysis table of diurnal geomagnetic variation calculation results at the HAD station.

HAD	SVM-RF			Distance Weighted	Latitude Difference Weighting	Two-Factor Weighting	Average Increase in Effect
	Combination Weight	SVM	0.5026
	Combination Weight	RF	0.4974
RMSE/nT	2.0712			2.8012	2.8823	2.8706	27%
MAE/nT	1.5400			2.0340	2.0860	2.0806	26%
Correlation	0.80			0.68	0.63	0.64	23%

Table 4. Comparison and analysis table of diurnal geomagnetic variation calculation results at the FUR station.

FUR	SVM-RF			Distance Weighted	Latitude Difference Weighting	Two-Factor Weighting	Average Increase in Effect
	Combination Weight	SVM	0.5175
	Combination Weight	RF	0.4825
RMSE/nT	1.9345			2.2683	2.0504	2.0554	9%
MAE/nT	1.5188			1.91	1.641	1.6303	13%
Correlation	0.80			0.77	0.78	0.78	3%

Table 5. Comparison and analysis table of diurnal geomagnetic variation calculation results at the ESK station.

ESK	SVM-RF			Distance Weighted	Latitude Difference Weighting	Two-Factor Weighting	Average Increase in Effect
	Combination Weight	SVM	0.5231
	Combination Weight	RF	0.4769
RMSE/nT	1.5712			2.2937	2.2496	2.4844	35%
MAE/nT	1.1593			1.8594	1.9872	1.9798	41%
Correlation	0.82			0.70	0.63	0.64	24%

Table 6. Comparison and analysis table of diurnal geomagnetic variation calculation results at the BEL station.

BEL	SVM-RF			Distance Weighted	Latitude Difference Weighting	Two-Factor Weighting	Average Increase in Effect
	Combination Weight	SVM	0.5012
	Combination Weight	RF	0.4988
RMSE/nT	1.2126			2.2361	2.1592	2.2181	45%
MAE/nT	0.8852			1.6365	1.6458	1.6699	46%
Correlation	0.90			0.67	0.68	0.68	34%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xiong, P.; Bian, G.; Liu, Q.; Jin, S.; Yin, X. A Prediction Model of Marine Geomagnetic Diurnal Variation Using Machine Learning. Appl. Sci. 2024, 14, 4369. https://doi.org/10.3390/app14114369

AMA Style

Xiong P, Bian G, Liu Q, Jin S, Yin X. A Prediction Model of Marine Geomagnetic Diurnal Variation Using Machine Learning. Applied Sciences. 2024; 14(11):4369. https://doi.org/10.3390/app14114369

Chicago/Turabian Style

Xiong, Pan, Gang Bian, Qiang Liu, Shaohua Jin, and Xiaodong Yin. 2024. "A Prediction Model of Marine Geomagnetic Diurnal Variation Using Machine Learning" Applied Sciences 14, no. 11: 4369. https://doi.org/10.3390/app14114369

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Prediction Model of Marine Geomagnetic Diurnal Variation Using Machine Learning

Abstract

1. Introduction

Related Work

2. Methodology

2.1. Principle of the SVM Algorithm

2.2. Principle of the RF Algorithm

2.3. Fusion Model

2.4. Comparison Algorithm Model

3. Overview of the Study Area and Data Processing

3.1. Research Area Introduction

3.2. Data Preprocessing

4. Analysis of Daily Variation Law

5. Experimental Result Analysis

5.1. Example Analysis

5.2. Model Applicability Analysis

6. Discussion

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI