Next Article in Journal
Research on a Matching Method for Vehicle-Borne Laser Point Cloud and Panoramic Images Based on Occlusion Removal
Next Article in Special Issue
RSWFormer: A Multi-Scale Fusion Network from Local to Global with Multiple Stages for Regional Geological Mapping
Previous Article in Journal
Enhanced Strapdown Inertial Navigation System (SINS)/LiDAR Tightly Integrated Simultaneous Localization and Mapping (SLAM) for Urban Structural Feature Weaken Occasions in Vehicular Platform
Previous Article in Special Issue
Integrating Knowledge Graph and Machine Learning Methods for Landslide Susceptibility Assessment
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Remote Sensing Thematic Product Generation for Sustainable Development of the Geological Environment

by
Jiabao Li
1,2,
Wei Ding
2,
Wei Han
2,
Xiaohui Huang
1,2,*,
Ao Long
2 and
Yuewei Wang
2
1
International Research Center of Big Data for Sustainable Development Goals, Beijing 100094, China
2
School of Computer Science, China University of Geosciences, Wuhan 430074, China
*
Author to whom correspondence should be addressed.
Remote Sens. 2024, 16(14), 2529; https://doi.org/10.3390/rs16142529
Submission received: 22 May 2024 / Revised: 25 June 2024 / Accepted: 8 July 2024 / Published: 10 July 2024

Abstract

:
Remote sensing thematic data products are critical for assessing and analyzing geological environments, while efficient generation of thematic products is also highly significant for achieving corresponding sustainable development goals (SDGs). Currently, remote sensing thematic product generation has problems like low levels of automation and efficiency. Addressing these challenges is imperative for advancing sustainable development within the geological environment. This paper aims to address issues related to the generation of geological environment remote sensing thematic products, sorting through the overall process of remote sensing thematic product generation, exploring algorithm encapsulation, combination, and execution under technical methods for container and workflow, and relies on the Spark distributed processing architecture to achieve efficient thematic product generation supported by multiple geological environment data processing models. Finally, taking the three SDGs of SDG6, SDG11, and SDG15 as examples, we achieved the generation of a variety of thematic products such as the interpretation of water body distribution, extraction of urban informal settlements and distribution of water and soil erosion. Meanwhile, we comparatively analyzed the efficiency of thematic product generation on different processing architectures, and the experimental results further verified the feasibility and effectiveness of our proposed solution. This research provides a programme for the automated and intelligent generation of geological environment remote sensing thematic products and effectively assists the construction of sustainable development in the geological environment.

1. Introduction

Sustainable development goals (SDGs) call for concerted global action to eliminate poverty, protect the planet, and improve the lives and futures of all [1]. SDG data products play a significant role in advancing SDGs, which are digital resources that can benefit society [2]. SDG data products can be divided into general data products like remote sensing data and thematic data products targeting specific thematic applications or fields [3]. Examples of thematic data products include spatial distribution products for forest cover, global water resource data products, and multiple other types designed for specific thematic purposes. Currently, most SDG thematic data products could be derived from processed remote sensing data and other multi-source data. Given the significant importance of diversified thematic data products for monitoring and assessing the Earth and human habitats, research focused on remote sensing thematic data product generation for SDGs has become crucial. Remote sensing data products can be categorized into standard products, common products, and thematic products based on the level of processing [4]. Among these, the impact of thematic data products in supporting the achievement of SDGs is most prominent, as they are typically derived from standard data products or common products [5]. These products are obtained using quantitative remote sensing feature parameter inversion methods or multi-source data overlay analysis methods to generate thematic data that can support the application in various fields and departments [6,7].
As a provider of natural resources and a fundamental component of human survival and productivity, the geological environment necessitates meticulous monitoring and assessment [8]. Thematic data products focused on the geological environment play a crucial role in analyzing changes in key geological elements and assessing geological conditions, thereby facilitating the sustainable development of the geological environment [9]. There are many frequently used thematic data products in the geological environment, such as geological element distribution, land use, and landform classification. Meanwhile, thematic data like bearing capacity analysis and pollution assessment of the geological environment are gradually playing a significant role. Due to the widespread utilization of thematic data products, numerous researchers and departments within the geological and environmental sciences exhibit a significant demand for these resources. Research focusing on remote sensing thematic data product generation for sustainable development in the geological environment holds significant importance [10]. These products can assist in various aspects of geological environment research, particularly in supporting knowledge-sharing and building sustainable development initiatives of geological environments [11].
The development and promotion of remote sensing technology have led to an increased reliance on remote sensing thematic data products in many fields. Nevertheless, the current modes and methods of thematic product generation are inadequate to meet the demand for large spatial scales, high efficiency, and automated generation of thematic data products tailored to geological environment scenarios [12]. Traditional thematic product generation typically involves multiple manual steps, including data preprocessing, information extraction, and post-processing [13]. This results in a low level of automation and insufficient efficiency in overall product generation. Furthermore, remote sensing data inherently possesses large data volumes, and this inherent characteristic makes the data product generation pose enormous challenges to efficient data processing [14]. In the contemporary era, researchers are increasingly inclined to investigate a specific topic across a broader geographical area or even on a global scale. This practice has the potential to yield unexpected insights and perspectives. For remote sensing thematic product generation tasks over large spatial extents, the single-machine processing approach lacks sufficient processing capacity to meet efficiency demand. Simultaneously, the rapid development of artificial intelligence technology and its widespread application in the geological field are propelling thematic product generation toward intelligent directions [15,16,17].
In response to the demand for remote sensing thematic data products in the field of geological environment, this paper focuses on the research and development of automated and efficient generation methods for such products. As shown in Figure 1, to accomplish the task of thematic product generation, centralized management of diverse data sources related to geological environments is an indispensable step. Additionally, it involves encapsulating models or algorithms for processing and analyzing multi-source geological environment data from various aspects. Simultaneously, it entails implementing operator combinations and calculation execution tailored to the generation process of thematic data products. Ultimately, this effort aims to provide researchers in the geological environment field with a rich array of remote sensing thematic data products. To accomplish diverse remote sensing thematic product generation for geological environments, the following key issues need to be addressed. Initially, it is imperative to manage and facilitate access to multi-source geological environmental data, including remote sensing data and various other geological environment data. Subsequently, enabling the flexible execution of diverse, geological environmental data processing models within a unified platform or architecture is crucial. Lastly, efficient data processing and product generation are indispensable to meet the demands of generating thematic data products on large spatial scales.
This paper addresses the key challenges associated with remote sensing thematic product generation in geological environment scenarios. The paper primarily focuses on several key aspects: centralized management of multi-source geological environment data, clarifying the thematic product generation process, encapsulating multiple processing algorithms, employing workflow-based logic for operator combination and calculation execution, and implementing efficient product generation based on the Apache Spark architecture. The main contributions of this paper are as follows:
  • We delineate the overall process of remote sensing thematic product generation and centrally manage the geological environment data using a multi-level data integration method.
  • We implement multiple-algorithm encapsulation based on container technology and flexible calculation execution under the workflow mechanism, solving the problem of unified processing of heterogeneous models.
  • In addressing the challenge of remote sensing thematic product generation with expansive spatial scales or substantial data volumes, we have accomplished efficient thematic product generation utilizing the Spark processing architecture.
The remainder of this paper is structured as follows. We present an overview of the background and related work in Section 2. Section 3 introduces the proposed programme for thematic product generation. Section 4 details the experiments and case studies. Section 5 presents discussions on experimental results and points out our future works. Finally, in Section 6, we provide a summary and conclude the paper.

2. Related Work

In this section, we discuss the current research status of thematic product generation, covering various aspects such as societal demand, technological applications, and existing challenges. Furthermore, we delve into the research status of several primary contents closely related to thematic product generation, including multi-source data management, diverse algorithms integration and processing, and efficient processing of large-scale spatial data.

2.1. Thematic Product Generation

Extracting knowledge or patterns from existing multi-source data is of great significance to the sustainable development of the geological environment [18]. The significance of thematic data products can be reflected in various aspects, including subsequent analysis, assessment, planning, and decision support. At this stage, the generation of thematic products focusing on geological environment remote sensing data covers applications in different scenarios, including land cover classification, interpretation of geological elements such as rock–soil–water, and time series analysis and prediction of various elements [19]. For instance, Lu et al. used an adaptive feature fusion network to interpret and extract soil elements while completing the thematic mapping of soil elements on a large spatial scale [20]. Yan et al. processed time series remote sensing images to monitor changes in large-area land cover and complete mapping of land cover conditions [21]. In addition, many other researchers have conducted fine-grained, all-weather, and low-cost identification and mapping of geological elements based on geological environment remote sensing data, thereby effectively supporting the evaluation and analysis of corresponding geological elements [22,23]. As a typical representative of artificial intelligence technology, deep learning has been widely used in fields such as speech recognition and image interpretation, with unprecedented impact [24]. Meanwhile, deep learning has been deeply applied in the fields of geology and remote sensing, especially in the intelligent interpretation of various geological environmental elements. Han used deep learning to extract features to efficiently interpret various elements such as lithology, soil, surface water, and glaciers in the geological environment [25]. Wang et al. completed geological remote sensing lithology mapping based on the adversarial semi-supervised segmentation network [26].
The Group on Earth Observations (GEO) has made significant efforts to organize and translate data about the Earth into products, and its objective is to design and produce information products openly (https://wmo.int/activities/group-earth-observations-geo, accessed on 3 May 2024). Meanwhile, some national development strategies and SDG cases also indicate that there is an immediate requirement to investigate the theories and technologies underlying remote sensing thematic products generation [21,22,27], providing an abundant array of data products to support research across multiple fields related to geology and geography [28]. The generation of multi-element and multi-thematic data products related to the geological environment plays a significant role in the sustainable development of the geological environment [29]. In addition, further research is required in remote sensing thematic product generation in the geological environment, with a particular focus on efficient and automated generation based on many intelligent models. In conclusion, the current research intensity and widespread applications also further indicate that the remote sensing thematic product generation for the sustainable development of the geological environment will continue to influence the development process of multiple research fields.
The application of remote sensing data is now pervasive across a range of domains, giving rise to the emergence of numerous system platforms that are capable of generating remote sensing data products. These platforms are tailored to different remote sensing data application scenarios and requirements, employing distinct technological architectures or solutions for remote sensing thematic data product generation [30]. Yan proposed a product generation program using multisource remote sensing data across distributed data centers in a cloud environment, and the cloud-based remote sensing production system could deal with massive remote sensing data and different product generation [31]. Zhao et al., designed the ground Global LAnd Surface Satellite (GLASS) product generation system, which uses the methods of task management, parallelization, and multi I/O channels and can be used to generate long-sequence time series of global land surface data products based on various remotely sensed data [32]. Stephanie et al., conducted in-depth research on earthquake damage assessment methodologies and rapid mapping for casualty estimation. Meanwhile, they provided rapid damage maps using a team-based visual interpretation approach and conducted case studies on the 2010 Haiti earthquake and the 2011 Van (Turkey) earthquake [33]. Moreover, a multitude of research institutions and companies have devised and constructed remote sensing data generation systems, with a primary focus on the generation of standardized and common-use products. These data generation platforms facilitate data exchange and sharing among researchers, thereby promoting the application and advancement of remote sensing [34,35,36]. Nevertheless, there are still shortcomings in remote sensing thematic product generation in the geological environment, especially in meeting the practical needs of high automation and processing efficiency.
The advancement of computer technology has significantly influenced the continuous evolution of remote sensing data product generation, which is perpetually undergoing evolution. The expanding application of remote sensing data products and the growing demand necessitate further implementation of cutting-edge computing and methodologies from the computer science domain to achieve efficient and comprehensive generation. It is crucial to address the issues of low generation efficiency, the limited product types available, and the single service offered [37]. In terms of the system architecture of remote sensing data processing and product generation, there has been a shift from the traditional stand-alone processing system to the current system based on high-performance computer cluster architecture [38,39]. Limited by data storage and computing capabilities of single-machine equipment, data product generation under a cluster architecture has shown higher efficiency and powerful processing capabilities when processing large-scale, high-precision, and long-term series of remote sensing data [40]. The majority of remote sensing thematic data products are generated based on machine learning or deep learning methods, including the intelligent extraction of specific elements of the geological environment and mapping. Furthermore, the efficient computation of deep learning models in a computer cluster environment represents a cutting-edge research direction [41]. Artificial intelligence algorithms for extensive geological environment data processing demonstrate promising applications in accelerating distributed drive performance.

2.2. Multi-Source Data Management

In the generation process of thematic data products, multiple sources of geological environment-related data are utilized, including raster data such as remote sensing and unmanned aerial vehicle data, vector-based foundational geographic data, and various other types of geological environment data. With the wide-ranging sources and diverse types of geological environment data, there is a need to manage them centrally and provide data discovery services. However, the distinct characteristics of different data types present challenges in achieving centralized data management. The evolution of spatial data management technology continues to progress, and various domains or researchers are committed to implementing and applying data management solutions.
Focusing on the multi-source data management of the geological environment, this study examines research progress in data management from two perspectives: semantic description and heterogeneous data organization. For one thing, geological, environmental data acquired by different institutions often vary significantly in the semantic description, leading to inconsistencies in the fields used to characterize data features. Addressing this challenge requires representing various data types at the semantic level, typically achieved through the design of a unified metadata model [42]. Unified metadata facilitates the retrieval and sharing of multi-source geological environmental data [43]. Metadata are of essential importance for managing various types of data, particularly within contemporary data management approaches such as those employed in data lake architectures, where metadata management constitutes a critical component [44]. Peter et al. investigated data management and sharing in the field of life sciences based on crowd-sourced metadata standards [45]. Furthermore, for the management matters of different types of geospatial data, corresponding data models and databases can be employed based on the data formats to organize and store the data effectively. For instance, Gloria established a spatial data infrastructure, enabling the integrated management of diverse geospatial data sources and temporal data [46]. Nevertheless, the expansion of data acquisition techniques has led to a gradual increase in the quantity and diversity of available data. The effective and comprehensive management of multi-source data continues to present significant challenges, and further research is required to address these issues.

2.3. Diverse Algorithm Integration and Processing

In the geological, environmental domain, there is a demand for diverse thematic data products, often derived through the computation of various geological-related data processing algorithms. However, inconsistencies in the programming languages and execution environments used by different models hinder their integration and processing on a unified platform. Container technology, as a form of virtualization technology, possesses characteristics such as being lightweight, having excellent isolation, and showing high portability. It serves as a solution for encapsulating and flexibly deploying heterogeneous processing models [47]. Xu conducted research on algorithm integration based on container technology, which supports various programming languages and diverse runtime environments. Additionally, it facilitates container orchestration for complex processing workflows [48]. In the field of remote sensing information processing, Knoth utilizes containers to package open-source software programs and standardize workflows, facilitating the portability of GEOBIA processing workflows. Additionally, leveraging Docker containers as a foundational infrastructure for remote sensing cloud services enhances resource utilization efficiency [49]. Shah constructed a flexible deployment platform for deep learning models, integrating Docker container technology with Kubernetes. This platform facilitates the containerization, automated deployment, and management of trained models [50]. Research into the generation of thematic data products in geological environments, leveraging container technology to encapsulate and process multiple models, holds promising prospects. Based on existing studies, it is evident that further practical implementation and promotion of container technology-driven model management in spatial data processing is required [51]. Under the specific objectives of this study, container technology is employed to encapsulate and process a range of models, thereby facilitating the generation of thematic data products with a high degree of applicability.

2.4. Efficient Processing of Spatial Data

Currently, concerning the thematic data product generation related to geological environments, remote sensing data have become an indispensable source of information. The resolution of remote sensing data is gradually increasing, providing richer information about land surface. Simultaneously, the demand for large-scale mapping of geological environmental key elements is becoming increasingly widespread to facilitate better analysis and assessment of geological spatial conditions. However, the inherent characteristic of large volumes in remote sensing data and the mapping requirements at large spatial scales pose challenges to the efficiency of thematic product generation. This also motivates the need to achieve data product generation tailored to large spatial scales and massive data volumes to meet the demand from geological-related departments for multi-thematic and large-scale products.
With the evolution of computer technologies, spearheaded by big data, the theories and methods for processing spatial big data are continuously evolving. The prevalent big data processing technologies include utilizing data processing frameworks such as MapReduce and Spark to accomplish efficient processing of large volumes of data. Currently, many researchers have accomplished spatial big data computation and analysis under the two processing architectures [52]. Sukanta achieved the large-scale mapping of the Jahazpur mineralized belt, conducted by a MapReduce model with an integrated extreme learning machine method [53]. Mazin’s proposed platform is based on the Apache Hadoop ecosystem and supports performing analysis on large amounts of multispectral raster data using MapReduce. Their research results indicate that the platform can process massive amounts of data faster due to the application of distributed computing [54]. While MapReduce has addressed the needs of high-performance computing for many large-volume datasets to some extent, the Spark architecture offers a more flexible and efficient approach to data processing based on memory [55]. He et al. proposed a memory-based distributed computing framework named GeoBeam, which can abstract all the operations of spatial data into spatial pipelines, collections, and transforms and support efficient range query and processing of large-scale spatial data on the Spark cluster [56]. In the research context of thematic product generation in geological environments, utilizing the Spark processing architecture to handle huge data volumes or large-scale spatial data offers substantial practical and application space.

3. Thematic Product Generation

In the research work of this paper, we are committed to achieving the efficient generation of remote sensing thematic products essential for sustainable development in the geological domain. The diverse applications of remote sensing data products have significantly advanced various research domains, notably within the geological environment [57]. However, generating thematic products from remote sensing data necessitates integrating multi-source geological environment data. Simultaneously, diverse interpretation and processing models are integral to the generation of distinct thematic products [58]. The initial challenge is the effective management of data and models, which is essential for providing requisite support in data and algorithmic aspects during subsequent product generation and processing. Moreover, addressing the flexible computing execution of multiple algorithms is a critical consideration in complex thematic product generation tasks.
In response to the widespread application requirements and inherent challenges in thematic product generation, we proposed a programme designed for efficient remote sensing thematic product generation focused on the geological environment. As shown in Figure 2, the main research content of this solution includes three aspects: centralized management of data and algorithms, flexible algorithm execution and process control, and efficient data processing under a distributed architecture. Correspondingly, it is necessary to complete these three tasks sequentially to finally achieve remote sensing thematic product generation. By systematically investigating the aforementioned content and its corresponding methodologies, we have successfully achieved the generation of various thematic products within geological environment application scenarios. According to the overall scheme above, the paper focuses on the generation of thematic products from remote sensing data for the geological environment. The main research contents include geological environment data management, thematic product generation overall process description, encapsulation and execution of related processing algorithms, and efficient product generation under the Apache Spark distributed architecture.

3.1. Multi-Source Data Management

Multi-source data management and service sharing are necessary prerequisites for the research of remote sensing thematic product generation, which provides support for subsequent data product generation and services. The data involved in remote sensing thematic product generation surrounding geological environment scenarios have the characteristics of extensive sources, diverse types, inconsistent formats, and volumes [59]. Specifically, the data concerning product generation include various types of satellite remote sensing data, as well as basic geography, geological surveys, economic statistics, and other types of data [60]. Furthermore, algorithm management across various application processing contexts is imperative for facilitating the generation of a varied range of thematic products. This study necessitates centralized data management, encompassing diverse models and multi-source geological environment data. It establishes a foundation for subsequent thematic product generation.
We achieved centralized management of multi-source data and processing algorithms through a multi-level data integration method in the context of the geological environment. Specifically, the multi-level data integration method covers three levels: conceptual integration driven by the unification of metadata models, logical integration based on the consistent processing of spatio-temporal benchmarks, and physical integration where multiple data models coexist. Figure 3a depicts the multi-level integration-based data management method, with different tasks at each level of integration. On the one hand, concept integration primarily entails designing metadata models grounded in geospatial data metadata standards, ensuring the uniform conceptual-level description of multi-source geological environment data. Concurrently, leveraging the metadata model facilitates offering varied data queries to bolster the processes of data discovery and acquisition. On the other hand, logical integration involves coordinate conversion and spatial index generation based on spatiotemporal partitioning. This process can achieve consistency of spatial data in terms of spatial basis and time reference. Finally, physical-level integration involves the use of adaptive data models to characterize and organize various geological environment data, facilitating storage within the corresponding database system for persistent archiving.
To centrally manage multi-source, geological environmental data and corresponding processing algorithms, our specific technical approach encompasses the following aspects. Figure 3b provides an overview of the underlying multi-source geological environmental data storage scheme, together with the data flow relationships between the storage and the higher-level thematic product generation applications. First, specific metadata models were designed to describe diverse geological environmental data, and the metadata can be stored in ElasticSearch to provide unified data retrieval services. In addition to designing metadata models for multi-source data, the program integrated the metadata for multiple data processing algorithms. Additionally, we selected appropriate database systems to match the characteristics of the data to store different types of geological environmental data. In particular, high-volume data such as remote sensing images could be stored in the HDFS file system, while primarily vector-based geospatial data and various other data of varying sizes were stored in the Ceph object storage. Furthermore, we integrated the distributed memory system Alluxio into our thematic product generation platform to achieve faster data read and write speeds, meeting the data access requirements of higher-level processing applications. Simultaneously, within Alluxio, a queue-based data retrieval mechanism was employed that sequentially loads multiple data sets into the cache based on specific thematic product generation data requirements. The temporarily idle data were dynamically removed from the cache based on access time. The data storage scheme utilized for this research is detailed in Table 1, encompassing HDFS 3.2.3, Ceph 12.2.12, Alluxio 2.9, and other data storage solutions. Ultimately, the organization and storage management of multi-source, geological environmental data were realized through the coexistence of multiple data models and the collaborative support of heterogeneous database systems.

3.2. Product Generation Process

In the research on thematic product generation of remote sensing data for the geological environment, the product generation process generally includes pre-processing of several steps such as task creation, data selection, model selection, etc., and then based on the underlying data management and model processing to support the completion of thematic product generation. In this section, we sort the overall process of thematic product generation to provide a reference for relevant researchers.
For the computing task of remote sensing thematic product generation, we delineated the overarching logic and processing flow, as illustrated in Figure 4, depicting the pivotal steps. Firstly, the user initiates a thematic product generation request, prompting the system to conduct a subsequent task analysis. The request encompasses details like the thematic product type, temporal constraints, and spatial extent. This information enables the discernment of specific thematic product production requirements corresponding to the given task. Subsequently, following request resolution, the system must match relevant data to the thematic product generation task, ensuring alignment within the specified time and spatial parameters. Thematic product generation progresses only when data are complete and ready; otherwise, users should be informed of missing data or data rematching requirements. Furthermore, each thematic product generation entails collaborative completion by multiple processing modules. It is imperative to align the corresponding product processing processes with the task, ensuring task execution and process control of product generation are accomplished based on a well-defined workflow. Ultimately, the task transitions into the product generation execution phase, wherein the generation of thematic products from remote sensing data gradually unfolds through specific processes.
Within the entire processing process, data retrieval and product generation constitute the most pivotal steps. On the one hand, data-matching involves retrieving data on the platform according to the user’s thematic product generation requests. Comprehensive management of multi-source geological environment data efficiently facilitates querying and matching of time range, spatial range, and data type for thematic product generation applications. On the other hand, in the product generation execution stage, workflow becomes essential for overseeing the processing process. It facilitates the logical integration and scheduling of multiple data processing operators. Furthermore, the encapsulation and scheduled execution of operators necessitate completion based on container orchestration tools. A detailed elaboration of the specific implementation method follows below.

3.3. Algorithm Encapsulation and Workflow

The generation of different thematic products requires specific algorithms to process data and ultimately obtain mapping results. For example, thematic mapping of land cover, water body distribution, and soil erosion all require corresponding calculation and processing models to complete. Additionally, fundamental processing algorithms, including data conversion and result mapping, are indispensable. To ensure compatibility with diverse models on a unified platform, we employ the algorithm encapsulation method based on container technology for isolating algorithms and their respective dependent environments. Integrating workflow and container orchestration facilitates the logical combination and calculation execution of algorithms, thereby supporting the generation of various thematic products for remote sensing data. The thematic product generation method, depicted in Figure 5, relies on algorithm encapsulation and is workflow driven. Following the encapsulation of diverse processing algorithms, a model library is constructed. The associated configuration file is then read to validate the processing process, facilitating the systematic execution of multiple algorithms through container orchestration tools.
Focusing on the various processing algorithms inherent in generating geological environment thematic data products, we utilize container encapsulation methods to effectively isolate both the model and the operating environment. Container technology serves as a lightweight virtualization solution, offering a resource-independent operating environment for software applications, algorithms, and their associated components. In this research, Docker container technology is used to encapsulate the diverse algorithm and their runtime environment, thus solving the problem of environmental heterogeneity. In contrast to alternative container technologies, Docker exhibits high system resource utilization, rapid startup, robust portability, and ease of maintenance [61]. Specifically, we encapsulate running dependent environments, incorporating specified versions of package dependencies, third-party libraries, etc., along with model files needed for various geological environment remote sensing data processing algorithms. After encapsulation, a portable image file is generated, facilitating the establishment of a mobile thematic product generation platform. Moreover, algorithms encapsulated within a container can be flexibly invoked, and the container can process specified input data to produce the desired thematic product. In the actual development of product generation, we encapsulate the processing provided by each container into services at the code level, which can further connect to calculation execution under the workflow mechanism.
Engaging in thematic product generation entails multiple processing stages culminating in the comprehensive mapping of the data product. For instance, interpreting water bodies entails data preprocessing, extracting information using deep learning models, performing data conversion, and mapping the obtained results. This detailed study analyzes the generation process of each thematic product, delineating multiple processing stages along with their respective inputs and outputs to articulate a comprehensive workflow corresponding to the thematic product generation. Meanwhile, we articulate the aforementioned details in the XML document, facilitating the explicit documentation and recording of the process. Specifically, the Flowable workflow engine was utilized for controlling the thematic product generation process, driven by its high configurability, exceptional performance, and support for dynamic process expansion. In general, the workflow-based thematic product generation and processing method can flexibly connect to various model computing services supported by each container, making the overall data processing and product generation more efficient and flexible. For the container encapsulation and flexible processing implementation, we used Docker 24.0.2 and Flowable 6.8.0 separately. It should be noted that in the specific implementation, there may be version incompatibilities that affect the overall solution, so it is necessary to use the latest stable version whenever possible.

3.4. Efficient Product Generation

Thematic product generation centered around geological, environmental data typically relies on extensive remote sensing datasets. However, large spatial scale data product generation is frequently constrained by the computational capabilities of individual machines, leading to dissatisfied overall processing efficiency. To optimize computational efficiency for product generation and accelerate task execution, we employ a distributed processing approach using the Apache Spark architecture. Meanwhile, the goal is to enable the efficient execution of multiple deep learning models within a distributed computing framework. Parallel processing is applied in some key steps, including reading large-volume remote sensing images, interpretation processing, and writing files. The distributed processing method increases the efficiency of producing thematic products related to the geological environment.
Generally, remote sensing thematic product generation for geological environments primarily relies on interpretation driven by deep learning models. During the generation of thematic products, the interpretation processing of large-scale spatial remote sensing data is predominantly achieved through the following procedural steps. Initially, it is necessary to read remote sensing image information, encompassing vital elements such as image data, affine matrices, and map projections. Secondly, remote sensing images need to undergo cropping to conform to predefined dimensions suitable for the thematic interpretation model. Subsequently, the specific deep learning model is applied to classify and predict outcomes for the segmented image slices. Next, the interpretation results must be recorded, including a series of operations such as color mapping and writing PNG images. Ultimately, through post-processing steps encompassing format conversion and result integration, the thematic data product characterizing the geological environment is generated. We analyze various processing models for thematic products and implement several steps, including file read/write, image slicing, interpretation processing, and result writing, using the Apache Spark architecture. During the implementation process, it is essential to complete the reconstruction of all code based on the processing flow for each thematic product generation model under the Spark parallel processing library. This parallelization improves the overall processing efficiency of the artificial intelligence model. Figure 6 shows the parallel processing of the intelligent interpretation model using the Spark architecture. Parallel implementation, driven by the Spark processing architecture, is accomplished in a distributed environment for remote sensing images and diverse data sources. This approach culminates in the efficient generation of remote sensing thematic products for the geological environment. However, this method is very challenging for people who are not familiar with Spark programming.

4. Experiment

In this section, we conduct many experiments to validate the proposed methodology for remote sensing thematic data product generation. These experiments encompass various cases of generating remote sensing thematic products, as well as efficiently generating thematic products for large-scale remote sensing data processing. Section 4.1 demonstrates the overall effectiveness of generating multiple remote sensing thematic data products, which are deemed significant in geological environment research. Section 4.2 further employs an efficient thematic product generation method to accomplish data processing and product generation at a large spatial scale, followed by an overall efficiency comparison.

4.1. Thematic Product Generation Cases

Regarding the thematic product generation technology solution proposed in this paper for the sustainable development of the geological environment, we verified the effectiveness of the proposed solution using product generation driven by multiple deep models as a case study. Firstly, we employed a hierarchical data integration approach encompassing conceptual, logical, and physical integration for the comprehensive management of geological environment data regarding thematic product generation. Subsequently, following an analysis and segmentation of diverse deep learning models employed in thematic product generation, the encapsulation and scheduling of multiple operators were executed utilizing Docker container technology and the Flowable workflow solution. Furthermore, to generate thematic products from remote sensing data with expansive spatial coverage or substantial data volumes, the Spark distributed processing approach was employed to achieve efficient parallel-driven generation of thematic products. Finally, employing this methodology led to the successful generation of various thematic products associated with geological environment remote sensing data. Subsequent experiments systematically validated the efficacy of the proposed scheme.
Figure 7 shows the results of the generation of several thematic data products. Figure 7a is a thematic data product for the classification of urban functional zones in the middle reaches of the Yangtze River Basin; Figure 7b is a thematic data product for the interpretation of wetlands, lakes, and other water bodies in the middle reaches of the Yangtze River Basin; Figure 7c is a thematic data product for soil erosion disaster prediction in Wuhan City, Hubei Province; Figure 7d is a thematic data product for interpretation of informal residential areas in Wuhan City, Hubei Province. In terms of data input, remote sensing data of appropriate spatial and temporal scales can be retrieved from the platform according to the needs of different thematic products and combined with other types of multi-source data to finally produce data products. Taking the generation of water body interpretation products in the middle reaches of the Yangtze River as an example, it mainly uses two types of Chinese domestic satellite remote sensing images, ZY-3 and GF-2. Firstly, centralized management of water body interpretation-related data was achieved based on the multi-level data integration methodology. Then, we encapsulated the artificial intelligence model of water body interpretation under the container technology. Ultimately, it will be able to provide flexible container operations to achieve the generation of water body thematic products in specific spatial and temporal ranges based on user-defined requirements.

4.2. Product Generation Efficiency Comparison

In this section, we provide the experimental results of the thematic product generation case on the Spark cluster and compare the running time and efficiency of the sample thematic product generation model in two modes: stand-alone and Spark cluster. Among them, parallel processing in Spark cluster mode adopts the intelligent interpretation model parallel processing method proposed in the previous section of this paper. Moreover, the experimental hardware setup adheres to a Spark distributed cluster configuration, comprising three distinct physical nodes. Each node is furnished with an 8-core CPU processor, boasting over 16 GB of memory and a substantial 5 TB of hard disk storage. Detailed specifications for individual node hardware are delineated in Table 2.
The presented thematic product generation framework for geological environment remote sensing data operates on both single-machine and cluster environments. Experimental comparisons demonstrate a notable enhancement in efficiency within the Spark cluster environment. Specifically, we conducted thematic product generation experiments in stand-alone mode on Node 3, which is equipped with 64 GB of memory space. The experimental setup in cluster mode incorporates three nodes, namely Node 1, Node 2, and Node 3, to establish a distributed Spark cluster environment. Additionally, the thematic product generation processing in Spark is executed using Yarn mode. The cluster configuration involves three executors with a parallelism setting of 120, and each executor is assigned eight processing cores and a 12 GB memory allocation.
We took the generation of two thematic data products focusing on wetlands and informal settlements, which served as experimental cases. Experimental tests were conducted, varying hardware environments and processing diverse data volumes to assess performance. Among them, five experimental tests were conducted for the generation and processing of wetland thematic products, involving image quantities ranging from 100 to 2000. Concurrently, four experimental tests were undertaken for the generation of informal residential products, with image quantities varying from 1000 to 4000. Table 3 presents a comprehensive comparison of specific experimental parameters and results. It is evident that the processing speed within the Spark architecture surpasses thematic product generation in stand-alone mode significantly. Nevertheless, this superiority diminishes when handling smaller datasets, indicating that cluster processing does not confer a notable advantage in such scenarios. Taking the experiment of wetland thematic product generation for example, the overall performance gradually improved as the amount of data processed increased. Notably, when processing 2000 images, performance demonstrated a noteworthy increase of 42.61% compared to the stand-alone mode.
We present experimental running time charts for two thematic product generation scenarios in Figure 8. The figure conspicuously illustrates the enhanced performance achieved in product generation when processed within the Spark architecture. The proposed solution for generating thematic data products in geological environment remote sensing data was validated through the conducted experiments, affirming the feasibility of the proposed approach. The results underscore the efficacy of parallel processing within the Spark distributed architecture, showcasing a noticeable improvement in the efficiency of thematic product generation.

5. Discussion

In this study, we investigated the theory and methodology of generating remote sensing thematic data products customized for geological, environmental applications and sustainable development needs. Experimental results indicate that the proposed thematic product generation scheme can effectively meet the diverse requirements for thematic product generation in geological, environmental applications. Furthermore, the designed efficient thematic product generation methods enable efficient processing of large-volume remote sensing data, resulting in a significant improvement in overall product generation efficiency. However, this work still has some parts that can be further improved in our future works.
  • We investigated the technologies and methods for remote sensing thematic data product generation and proposed a feasible solution. However, the current integrated data and models are still inadequate, and the total amount of data and the model library need to be further improved and expanded. Next, we need to continue in-depth research on the technical methods and develop a corresponding thematic data generation and service platform to provide users with multiple thematic data products. This flexible and efficient thematic product processing and sharing service will further contribute to the construction of sustainable development in the geological environment.
  • The program proposed in this paper is an overall processing flow, and the technical methods between it still have room for improvement. In addition, the thematic data products finally generated by this proposed method cannot guarantee good accuracy. The subsequent objective is to enhance the precision of the thematic product generation model and optimize data processing efficiency within the Apache Spark framework. So, we will focus on optimizing the internal details of the model to ensure generation efficiency while guaranteeing the accuracy of mapping results.

6. Conclusions

This paper proposed a feasible solution for issues related to the generation of thematic products for geological environment remote sensing data. Initially, we introduced a centralized management approach that hinges on the multi-level integration of geological environment-related data and models. Next, we described the entire process of remote sensing thematic product generation. We executed multiple-algorithm encapsulation and calculation execution using container and workflow technology, ensuring alignment with generation requirements for diverse thematic data products. This meticulous approach facilitated the successful generation of multiple types of thematic data products.
Ultimately, to address the problem of insufficient generation efficiency of thematic products for large spatial scale remote sensing data, a distributed processing method driven by the Apache Spark architecture is proposed, which can effectively improve the efficiency of data processing and product generation. The work conducted in this study holds significant implications for the generation and dissemination of remote sensing thematic products. The proposed programme can provide valuable technical references for researchers engaged in related endeavors. Simultaneously, the derived thematic data products can be employed in various specific application contexts, encompassing analysis, evaluation, and decision-making within the geological environment. This application contributes to advancing the construction of sustainable development practices in the geological environment.

Author Contributions

Conceptualization, X.H.; methodology, J.L.; validation, W.D. and A.L.; writing—original draft preparation, J.L.; writing—review and editing, J.L. and X.H.; visualization, J.L.; supervision, J.L., W.H., X.H. and Y.W. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the International Research Center of Big Data for Sustainable Development Goals (CBAS2023ORP03) and the National Natural Science Foundation of China under Grant (42201415).

Data Availability Statement

The datasets adopted in this paper can be available on reasonable request from the corresponding author after these research projects are completed.

Acknowledgments

The authors would like to express their gratitude to all of the reviewers who have provided their comments on this article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Guo, H.; Huang, L.; Liang, D. Further promotion of sustainable development goals using science, technology, and innovation. Innovation 2022, 3. [Google Scholar] [CrossRef] [PubMed]
  2. Liang, D.; Guo, H.; Nativi, S.; Kulmala, M.; Shirazi, Z.; Chen, F.; Kalonji, G.; Yan, D.; Li, J.; Duerler, R.; et al. A future for digital public goods for monitoring SDG indicators. Sci. Data 2023, 10, 875. [Google Scholar] [CrossRef] [PubMed]
  3. Labbate, R.; Silva, R.F.; Rampasso, I.S.; Anholon, R.; Quelhas, O.L.G.; Leal Filho, W. Business models towards SDGs: The barriers for operationalizing Product-Service System (PSS) in Brazil. Int. J. Sustain. Dev. World Ecol. 2021, 28, 350–359. [Google Scholar] [CrossRef]
  4. Wu, X.; Xiao, Q.; Wen, J.; You, D.; Hueni, A. Advances in quantitative remote sensing product validation: Overview and current status. Earth Sci. Rev. 2019, 196, 102875. [Google Scholar] [CrossRef]
  5. Li, Z.L.; Wu, H.; Duan, S.B.; Zhao, W.; Ren, H.; Liu, X.; Leng, P.; Tang, R.; Ye, X.; Zhu, J.; et al. Satellite remote sensing of global land surface temperature: Definition, methods, products, and applications. Rev. Geophys. 2023, 61, e2022RG000777. [Google Scholar] [CrossRef]
  6. FG Assis, L.F.; Ferreira, K.R.; Vinhas, L.; Maurano, L.; Almeida, C.; Carvalho, A.; Rodrigues, J.; Maciel, A.; Camargo, C. TerraBrasilis: A spatial data analytics infrastructure for large-scale thematic mapping. ISPRS Int. J. Geo-Inf. 2019, 8, 513. [Google Scholar] [CrossRef]
  7. Ryu, J.H.; Choi, J.K.; Lee, Y.K. Potential of remote sensing in management of tidal flats: A case study of thematic mapping in the Korean tidal flats. Ocean. Coast. Manag. 2014, 102, 458–470. [Google Scholar] [CrossRef]
  8. Peyghambari, S.; Zhang, Y. Hyperspectral remote sensing in lithological mapping, mineral exploration, and environmental geology: An updated review. J. Appl. Remote Sens. 2021, 15, 031501. [Google Scholar] [CrossRef]
  9. Li, R.; Yin, Z.; Wang, Y.; Li, X.; Liu, Q.; Gao, M. Geological resources and environmental carrying capacity evaluation review, theory, and practice in China. China Geol. 2018, 1, 556–565. [Google Scholar] [CrossRef]
  10. Wu, C.; Zhang, Y.; Zhang, J.; Chen, Y.; Duan, C.; Qi, J.; Cheng, Z.; Pan, Z. Comprehensive Evaluation of the Eco-Geological Environment in the Concentrated Mining Area of Mineral Resources. Sustainability 2022, 14, 6808. [Google Scholar] [CrossRef]
  11. Van der Meer, F.D.; Van der Werff, H.M.; Van Ruitenbeek, F.J.; Hecker, C.A.; Bakker, W.H.; Noomen, M.F.; Van Der Meijde, M.; Carranza, E.J.M.; De Smeth, J.B.; Woldai, T. Multi-and hyperspectral geologic remote sensing: A review. Int. J. Appl. Earth Obs. Geoinf. 2012, 14, 112–128. [Google Scholar] [CrossRef]
  12. Nguyen, H.; Katzfuss, M.; Cressie, N.; Braverman, A. Spatio-temporal data fusion for very large remote sensing datasets. Technometrics 2014, 56, 174–185. [Google Scholar] [CrossRef]
  13. Foody, G.M. Thematic map comparison. Photogramm. Eng. Remote Sens. 2004, 70, 627–633. [Google Scholar] [CrossRef]
  14. Van Genderen, J.; Lock, B.; Vass, P. Remote sensing: Statistical testing of thematic map accuracy. Remote Sens. Environ. 1978, 7, 3–14. [Google Scholar] [CrossRef]
  15. Xu, Y.; Liu, X.; Cao, X.; Huang, C.; Liu, E.; Qian, S.; Liu, X.; Wu, Y.; Dong, F.; Qiu, C.W.; et al. Artificial intelligence: A powerful paradigm for scientific research. Innovation 2021, 2. [Google Scholar] [CrossRef]
  16. Chen, L.; Wang, L.; Miao, J.; Gao, H.; Zhang, Y.; Yao, Y.; Bai, M.; Mei, L.; He, J. Review of the application of big data and artificial intelligence in geology. J. Phys. Conf. Ser. 2020, 1684, 012007. [Google Scholar] [CrossRef]
  17. Li, S.; Chen, J.; Liu, C. Overview on the development of intelligent methods for mineral resource prediction under the background of geological big data. Minerals 2022, 12, 616. [Google Scholar] [CrossRef]
  18. Balaram, V. Rare earth elements: A review of applications, occurrence, exploration, analysis, recycling, and environmental impact. Geosci. Front. 2019, 10, 1285–1303. [Google Scholar] [CrossRef]
  19. Demattê, J.A.M.; Fongaro, C.T.; Rizzo, R.; Safanelli, J.L. Geospatial Soil Sensing System (GEOS3): A powerful data mining procedure to retrieve soil spectral reflectance from satellite images. Remote Sens. Environ. 2018, 212, 161–175. [Google Scholar] [CrossRef]
  20. Lu, Y.; He, K.; Xu, H.; Dong, Y.; Han, W.; Wang, L.; Liang, D. Remote Sensing Interpretation for Soil Elements using Adaptive Feature Fusion Network. IEEE Trans. Geosci. Remote Sens. 2023, 61, 4505515. [Google Scholar] [CrossRef]
  21. Yan, J.; Wang, L.; He, H.; Liang, D.; Song, W.; Han, W. Large-area land-cover changes monitoring with time-series remote sensing images using transferable deep models. IEEE Trans. Geosci. Remote Sens. 2022, 60, 4409917. [Google Scholar] [CrossRef]
  22. Han, W.; Zhang, X.; Wang, Y.; Wang, L.; Huang, X.; Li, J.; Wang, S.; Chen, W.; Li, X.; Feng, R.; et al. A survey of machine learning and deep learning in remote sensing of geological environment: Challenges, advances, and opportunities. ISPRS J. Photogramm. Remote Sens. 2023, 202, 87–113. [Google Scholar] [CrossRef]
  23. Wu, J.; Han, W.; Chen, J.; Wang, S. Improving Geological Remote Sensing Interpretation via Optimal Transport-Based Point–Surface Data Fusion. Remote Sens. 2023, 16, 53. [Google Scholar] [CrossRef]
  24. Dong, S.; Wang, P.; Abbas, K. A survey on deep learning and its applications. Comput. Sci. Rev. 2021, 40, 100379. [Google Scholar] [CrossRef]
  25. Han, W.; Li, J.; Wang, S.; Zhang, X.; Dong, Y.; Fan, R.; Zhang, X.; Wang, L. Geological remote sensing interpretation using deep learning feature and an adaptive multisource data fusion network. IEEE Trans. Geosci. Remote Sens. 2022, 60, 4510314. [Google Scholar] [CrossRef]
  26. Wang, S.; Huang, X.; Han, W.; Li, J.; Zhang, X.; Wang, L. Lithological mapping of geological remote sensing via adversarial semi-supervised segmentation network. Int. J. Appl. Earth Obs. Geoinf. 2023, 125, 103536. [Google Scholar] [CrossRef]
  27. He, H.; Yan, J.; Liang, D.; Sun, Z.; Li, J.; Wang, L. Time-series land cover change detection using deep learning-based temporal semantic segmentation. Remote Sens. Environ. 2024, 305, 114101. [Google Scholar] [CrossRef]
  28. Zuo, R.; Xiong, Y.; Wang, J.; Carranza, E.J.M. Deep learning and its application in geochemical mapping. Earth Sci. Rev. 2019, 192, 1–14. [Google Scholar] [CrossRef]
  29. Hanžl, P.; Verner, K. Basic Principles of Geological and Thematic Mapping; Czech Geological Survey: Stare Brno, Czech Republic, 2018. [Google Scholar]
  30. Kovalskyy, V.; Roy, D.P. The global availability of Landsat 5 TM and Landsat 7 ETM+ land surface observations and implications for global 30 m Landsat data product generation. Remote Sens. Environ. 2013, 130, 280–293. [Google Scholar] [CrossRef]
  31. Yan, J.; Ma, Y.; Wang, L.; Choo, K.K.R.; Jie, W. A cloud-based remote sensing data production system. Future Gener. Comput. Syst. 2018, 86, 1154–1166. [Google Scholar] [CrossRef]
  32. Zhao, X.; Liang, S.; Liu, S.; Yuan, W.; Xiao, Z.; Liu, Q.; Cheng, J.; Zhang, X.; Tang, H.; Zhang, X.; et al. The Global Land Surface Satellite (GLASS) remote sensing data processing system and products. Remote Sens. 2013, 5, 2436–2450. [Google Scholar] [CrossRef]
  33. Wegscheider, S.; Schneiderhan, T.; Mager, A.; Zwenzner, H.; Post, J.; Strunz, G. Rapid mapping in support of emergency response after earthquake events. Nat. Hazards 2013, 68, 181–195. [Google Scholar] [CrossRef]
  34. Pieschke, R.L. US Geological Survey Distribution of European Space Agency’s Sentinel-2 Data; Technical report; US Geological Survey: Reston, VA, USA, 2017. [Google Scholar]
  35. Baumann, P.; Mazzetti, P.; Ungar, J.; Barbera, R.; Barboni, D.; Beccati, A.; Bigagli, L.; Boldrini, E.; Bruno, R.; Calanducci, A.; et al. Big data analytics for earth sciences: The EarthServer approach. Int. J. Digit. Earth 2016, 9, 3–29. [Google Scholar] [CrossRef]
  36. Tamiminia, H.; Salehi, B.; Mahdianpari, M.; Quackenbush, L.; Adeli, S.; Brisco, B. Google Earth Engine for geo-big data applications: A meta-analysis and systematic review. ISPRS J. Photogramm. Remote Sens. 2020, 164, 152–170. [Google Scholar] [CrossRef]
  37. El Fels, A.E.A.; El Ghorfi, M. Using remote sensing data for geological mapping in semi-arid environment: A machine learning approach. Earth Sci. Inform. 2022, 15, 485–496. [Google Scholar] [CrossRef]
  38. Shirmard, H.; Farahbakhsh, E.; Müller, R.D.; Chandra, R. A review of machine learning in processing remote sensing data for mineral exploration. Remote Sens. Environ. 2022, 268, 112750. [Google Scholar] [CrossRef]
  39. Xu, C.; Du, X.; Yan, Z.; Fan, X. ScienceEarth: A big data platform for remote sensing data processing. Remote Sens. 2020, 12, 607. [Google Scholar] [CrossRef]
  40. Zhong, Y.; Wang, X.; Xu, Y.; Wang, S.; Jia, T.; Hu, X.; Zhao, J.; Wei, L.; Zhang, L. Mini-UAV-borne hyperspectral remote sensing: From observation and processing to applications. IEEE Geosci. Remote Sens. Mag. 2018, 6, 46–62. [Google Scholar] [CrossRef]
  41. Zhang, X.; Zhou, Y.; Luo, J. Deep learning for processing and analysis of remote sensing big data: A technical review. Big Earth Data 2022, 6, 527–560. [Google Scholar] [CrossRef]
  42. Nogueras-Iso, J.; Zarazaga-Soria, F.J.; Muro-Medrano, P.R. Geographic Information Metadata for Spatial Data Infrastructures; Springer: Berlin/Heidelberg, Germany, 2005. [Google Scholar]
  43. Sen, A. Metadata management: Past, present and future. Decis. Support Syst. 2004, 37, 151–173. [Google Scholar] [CrossRef]
  44. Sawadogo, P.; Darmont, J. On data lake architectures and metadata management. J. Intell. Inf. Syst. 2021, 56, 97–120. [Google Scholar] [CrossRef]
  45. McQuilton, P.; Gonzalez-Beltran, A.; Rocca-Serra, P.; Thurston, M.; Lister, A.; Maguire, E.; Sansone, S.A. BioSharing: Curated and crowd-sourced metadata standards, databases and data policies in the life sciences. Database 2016, 2016, baw075. [Google Scholar] [CrossRef]
  46. Bordogna, G.; Kliment, T.; Frigerio, L.; Brivio, P.A.; Crema, A.; Stroppiana, D.; Boschetti, M.; Sterlacchini, S. A spatial data infrastructure integrating multisource heterogeneous geospatial data and time series: A study case in agriculture. ISPRS Int. J. Geo-Inf. 2016, 5, 73. [Google Scholar] [CrossRef]
  47. Acharya, J.N.; Suthar, A.C. Docker container orchestration management: A review. In Proceedings of the International Conference on Intelligent Vision and Computing, Sur, Oman, 3–4 October 2021; Springer: Berlin/Heidelberg, Germany, 2021; pp. 140–153. [Google Scholar]
  48. Xu, C.; Du, X.; Jian, H.; Dong, Y.; Qin, W.; Mu, H.; Yan, Z.; Zhu, J.; Fan, X. Analyzing large-scale Data Cubes with user-defined algorithms: A cloud-native approach. Int. J. Appl. Earth Obs. Geoinf. 2022, 109, 102784. [Google Scholar] [CrossRef]
  49. Knoth, C.; Nüst, D. Reproducibility and practical adoption of geobia with open-source software in docker containers. Remote Sens. 2017, 9, 290. [Google Scholar] [CrossRef]
  50. Shah, J.; Dubaria, D. Building modern clouds: Using docker, kubernetes & Google cloud platform. In Proceedings of the 2019 IEEE 9th Annual Computing and Communication Workshop and Conference (CCWC), Las Vegas, NV, USA, 7–9 January 2019; pp. 0184–0189. [Google Scholar]
  51. Paraiso, F.; Challita, S.; Al-Dhuraibi, Y.; Merle, P. Model-driven management of docker containers. In Proceedings of the 2016 IEEE 9th International Conference on cloud Computing (CLOUD), San Francisco, CA, USA, 27 June–2 July 2016; pp. 718–725. [Google Scholar]
  52. Bhathal, G.S.; Singh, A. Big Data: Hadoop framework vulnerabilities, security issues and attacks. Array 2019, 1, 100002. [Google Scholar] [CrossRef]
  53. Roy, S.; Bhattacharya, S.; Omkar, S.N. Automated large-scale mapping of the jahazpur mineralised belt by a MapReduce model with an integrated elm method. J. Photogramm. Remote Sens. Geoinf. Sci. 2022, 90, 191–209. [Google Scholar] [CrossRef]
  54. Alkathiri, M.; Jhummarwala, A.; Potdar, M.B. Multi-dimensional geospatial data mining in a distributed environment using MapReduce. J. Big Data 2019, 6, 82. [Google Scholar] [CrossRef]
  55. Zaharia, M.; Xin, R.S.; Wendell, P.; Das, T.; Armbrust, M.; Dave, A.; Meng, X.; Rosen, J.; Venkataraman, S.; Franklin, M.J.; et al. Apache spark: A unified engine for big data processing. Commun. ACM 2016, 59, 56–65. [Google Scholar] [CrossRef]
  56. He, Z.; Liu, G.; Ma, X.; Chen, Q. GeoBeam: A distributed computing framework for spatial data. Comput. Geosci. 2019, 131, 15–22. [Google Scholar] [CrossRef]
  57. Zhao, F.f.; He, M.c.; Wang, Y.t.; Tao, Z.g.; Li, C. Eco-geological environment quality assessment based on multi-source data of the mining city in red soil hilly region, China. J. Mt. Sci. 2022, 19, 253–275. [Google Scholar] [CrossRef]
  58. Shirazy, A.; Shirazi, A.; Nazerian, H. Application of remote sensing in earth sciences—A review. Int. J. Sci. Eng. Appl. 2021, 10, 45–51. [Google Scholar] [CrossRef]
  59. Chi, H.; Sun, J.; Zhang, C.; Miao, C. Remote sensing data processing and analysis for the identification of geological entities. Acta Geophys. 2023, 71, 1565–1577. [Google Scholar] [CrossRef]
  60. Wang, L.; Zuo, B.; Le, Y.; Chen, Y.; Li, J. Penetrating remote sensing: Next-generation remote sensing for transparent earth. Innovation 2023, 4, 100519. [Google Scholar] [CrossRef]
  61. Reis, D.; Piedade, B.; Correia, F.F.; Dias, J.P.; Aguiar, A. Developing docker and docker-compose specifications: A developers’ survey. IEEE Access 2021, 10, 2318–2329. [Google Scholar] [CrossRef]
Figure 1. Remote sensing data thematic products support the sustainable development of the geological environment.
Figure 1. Remote sensing data thematic products support the sustainable development of the geological environment.
Remotesensing 16 02529 g001
Figure 2. Geological environment remote sensing data thematic product generation programme. It mainly includes centralized data management, flexible calculation execution of algorithms, and efficient data processing, ultimately generating a series of remote sensing thematic products.
Figure 2. Geological environment remote sensing data thematic product generation programme. It mainly includes centralized data management, flexible calculation execution of algorithms, and efficient data processing, ultimately generating a series of remote sensing thematic products.
Remotesensing 16 02529 g002
Figure 3. Centralized management method of geological environment data based on multi-level data integration. (a) illustrates a method of data management based on multi-level integration, where different tasks are performed at each level of integration. (b) provides an overview of the underlying multi-source geological environmental data storage scheme, together with the data flow relationships between the storage and the higher-level thematic product generation applications.
Figure 3. Centralized management method of geological environment data based on multi-level data integration. (a) illustrates a method of data management based on multi-level integration, where different tasks are performed at each level of integration. (b) provides an overview of the underlying multi-source geological environmental data storage scheme, together with the data flow relationships between the storage and the higher-level thematic product generation applications.
Remotesensing 16 02529 g003
Figure 4. The general processing process for the generation of remote sensing thematic products.
Figure 4. The general processing process for the generation of remote sensing thematic products.
Remotesensing 16 02529 g004
Figure 5. Thematic product generation under algorithm encapsulation and workflow technologies.
Figure 5. Thematic product generation under algorithm encapsulation and workflow technologies.
Remotesensing 16 02529 g005
Figure 6. Efficient thematic data product generation based on model parallelism under the Spark architecture.
Figure 6. Efficient thematic data product generation based on model parallelism under the Spark architecture.
Remotesensing 16 02529 g006
Figure 7. Case study on the generation of thematic products for geological environment remote sensing data. (a) is a thematic data product for the classification of urban functional zones in the middle reaches of the Yangtze River Basin; (b) is a thematic data product for the interpretation of wetlands, lakes, and other water bodies in the middle reaches of the Yangtze River Basin; (c) is a thematic data product for soil erosion disaster prediction in Wuhan City, Hubei Province; (d) is a thematic data product for interpretation of informal residential areas in Wuhan City, Hubei Province.
Figure 7. Case study on the generation of thematic products for geological environment remote sensing data. (a) is a thematic data product for the classification of urban functional zones in the middle reaches of the Yangtze River Basin; (b) is a thematic data product for the interpretation of wetlands, lakes, and other water bodies in the middle reaches of the Yangtze River Basin; (c) is a thematic data product for soil erosion disaster prediction in Wuhan City, Hubei Province; (d) is a thematic data product for interpretation of informal residential areas in Wuhan City, Hubei Province.
Remotesensing 16 02529 g007
Figure 8. Comparison of running time of product generation under different architectures.
Figure 8. Comparison of running time of product generation under different architectures.
Remotesensing 16 02529 g008
Table 1. Multi-source, geological environment data storage scheme. This shows the data storage technology solution adopted in this study, including the specific database system, characteristics of stored data, and types of stored data).
Table 1. Multi-source, geological environment data storage scheme. This shows the data storage technology solution adopted in this study, including the specific database system, characteristics of stored data, and types of stored data).
Database SystemData CharacteristicsData Category
HDFS 3.2.3Suitable for distributed storage of large amounts of data, with fault-tolerant mechanisms; Emphasize high throughput for data access rather than low latency for access;Original files of satellite and drone remote sensing data.
Ceph 12.2.12Support object storage, block storage, file storage, decentralization, strong consistency, and balanced data distribution.Statistical data, textual data, vector data, and various types of data of varying sizes.
Alluxio 2.9Memory-level IO access speed, simplifies the use of object storage, provides a single point of access to multiple data sources, simplify application deployment.-
Elasticsearch 8.5.3Realized based on Lucene inverted index, supports distributed search engine platform; high stability, good performance, easy to use.Metadata; other structured data.
Table 2. Hardware parameters of experimental cluster nodes.
Table 2. Hardware parameters of experimental cluster nodes.
NodeOperating SystemProcessorRAMHard Disk
Node 1Centos 7Intel(R) Xeon(R) CPU E5-2609 v4 @ 1.70 GHz * 816 GB5 TB
Node 2Centos 7Intel(R) Xeon(R) CPU E5-2609 v4 @ 1.70 GHz * 816 GB5 TB
Node 3Centos 7Intel(R) Xeon(R) CPU E5-2609 v4 @ 1.70 GHz * 864 GB5 TB
Table 3. Comparison of running time and efficiency in stand-alone and cluster modes. Thematic product generation using wetland interpretation and informal residential area extraction was used as an experimental case.
Table 3. Comparison of running time and efficiency in stand-alone and cluster modes. Thematic product generation using wetland interpretation and informal residential area extraction was used as an experimental case.
Data Product Production TypeData Volume (Tiles)Standalone Mode (Seconds)Cluster Mode (Seconds)Performance Improvement (%)
Wet land10095.79161.15-
500524.61391.0625.45
1000963.31639.9933.56
15001552.93933.1739.91
20002094.941202.2242.61
Informal Residential Areas1000175.34155.7411.17
2000373.09261.2129.51
3000574.59363.3436.76
4000730.22462.2036.70
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, J.; Ding, W.; Han, W.; Huang, X.; Long, A.; Wang, Y. Remote Sensing Thematic Product Generation for Sustainable Development of the Geological Environment. Remote Sens. 2024, 16, 2529. https://doi.org/10.3390/rs16142529

AMA Style

Li J, Ding W, Han W, Huang X, Long A, Wang Y. Remote Sensing Thematic Product Generation for Sustainable Development of the Geological Environment. Remote Sensing. 2024; 16(14):2529. https://doi.org/10.3390/rs16142529

Chicago/Turabian Style

Li, Jiabao, Wei Ding, Wei Han, Xiaohui Huang, Ao Long, and Yuewei Wang. 2024. "Remote Sensing Thematic Product Generation for Sustainable Development of the Geological Environment" Remote Sensing 16, no. 14: 2529. https://doi.org/10.3390/rs16142529

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop