Next Article in Journal
Image Retrieval Method Based on Visual Map Pre-Sampling Construction in Indoor Positioning
Previous Article in Journal
A Semi-Automatic Semantic-Model-Based Comparison Workflow for Archaeological Features on Roman Ceramics
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Quality Assessment of Global Ocean Island Datasets

1
School of Geography and Information Engineering, China University of Geosciences, Wuhan 430074, China
2
Department of Military Oceanography and Hydrography & Cartography, Dalian Naval Academy, Dalian 116018, China
3
International Research Center of Big Data for Sustainable Development Goals, Beijing 100094, China
*
Author to whom correspondence should be addressed.
ISPRS Int. J. Geo-Inf. 2023, 12(4), 168; https://doi.org/10.3390/ijgi12040168
Submission received: 13 February 2023 / Revised: 4 April 2023 / Accepted: 7 April 2023 / Published: 13 April 2023

Abstract

:
Ocean Island data are essential to the conservation and management of islands and coastal ecosystems, and have also been adopted by the United Nations as a sustainable development goal (SDG 14). Currently, two categories of island datasets, i.e., global shoreline vector (GSV) and OpenStreetMap (OSM), are freely available on a global scale. However, few studies have focused on accessing and comparing the data quality of these two datasets, which is the main purpose of our study. Specifically, these two datasets were accessed using four 100 × 100 (km2) study areas, in terms of three aspects of measures, i.e., accuracy (including overall accuracy (OA), precision, recall and F1), completeness (including area completeness and count completeness) and shape complexity. The results showed that: (1) Both the two datasets perform well in terms of the OA (98% or above) and F1 (0.9 or above); the OSM dataset performs better in terms of precision, but the GSV dataset performs better in terms of recall. (2) The area completeness is almost 100%, but the count completeness is much higher than 100%, indicating the total areas of the two datasets are almost the same, but there are many more islands in the OSM dataset. (3) In most cases, the fractal dimension of the OSM dataset is relatively larger than the GSV dataset in terms of the shape complexity, indicating that the OSM dataset has more detail in terms of the island boundary or coastline. We concluded that both of the datasets (GSV and OSM) are effective for island mapping, but the OSM dataset can identify more small islands and has more detail.

1. Introduction

Ocean islands, which are defined as lands that are entirely surrounded by ocean waters, are not only homes to many unique plants and animals around the world, but also living places for human beings. It has been estimated that approximately 550 million people, 9–10% of the world’s population, live on islands [1]. Currently, the conservation and management of islands and coastal ecosystems are receiving significant attention [2] because islands are now threatened by rising sea levels (caused by climate change [3,4,5]), natural disasters (such as storms, tsunamis, and volcanic eruptions [6,7]), and human activity (such as overfishing and island degradation [8,9,10]). In order to deal with these challenges, the management and protection of marine and coastal ecosystems has been adopted by the United Nations as one of the 17 sustainable development goals (SDGs), specifically SDG 14: Conserve and sustainably use the oceans, seas, and marine resources for sustainable development [11]. Currently, available and large-scale geospatial data related to islands are especially needed for the evaluation and monitoring of various indicators related to SDG 14.

1.1. Related Works

Remote sensing has been viewed as a potential technology for detecting islands and relevant characteristics, such as temperature and land-use change. Dong et al. [12] developed a simple method for mapping the inundation frequency of coral reefs in the Spratly Islands in the South China Sea using time series Landsat-8 OLI images. Immordino et al. [13] used Sentinel-2 multispectral data to map different types of habitats, including corals, seagrasses, and mangroves, in the Palau Republic in the Pacific Ocean. Lyons et al. [14] presented a framework capable of mapping coral reef habitats from individual reefs to entire barrier reef systems and across vast ocean extents, using high-resolution remote sensing data available on a global scale. Zhuang et al. [15] proposed a technical framework for automatic coral reef extraction based on an image filtering strategy and spatio-temporal similarity measurements of pixel-level Sentinel-2 image time series. Mikelsons et al. [16] developed a methodology to derive a global medium resolution (250 m) land mask or water mask from several existing data sources. In terms of island characteristics, Král and Pavliš [17] produced the first detailed land-cover map of Socotra Island using Landsat 7 ETM+ data. Révillion et al. [18] developed a land-use/land-cover product based on remote sensing processing of high spatial resolution satellite images acquired by the SPOT 5 satellite between December 2012 and July 2014. Chen et al. [19] used Landsat data for eight periods from 1984 to 2020 to explore the spatial and temporal characteristics of the land-use landscape pattern of Zhoushan Island, China. Holdaway et al. [20] analyzed changes in the land area on 221 atolls (ring-shaped coral islands or reefs) in the Indian and Pacific Oceans. Leihy et al. [21] applied a spatial-temporal gap-filling method to high-resolution (~1 km) land surface temperature observations for 20 Southern Ocean islands.
Although extensive studies have been conducted to detect islands and the relevant characteristics of islands, most have focused on proposing approaches, methods or technical frameworks, rather than producing available island data for public use. To address this gap, Sayre et al. [22] recently developed a 30 m spatial resolution global shoreline vector (GSV) from annual composites of 2014 Landsat 7 satellite images. The GSV dataset has three classes of islands: continental mainlands, islands greater than 1 km2 and islands smaller than 1 km2. More importantly, this dataset was not only made available globally but also open to the public. As another alternative, the OpenStreetMap (OSM) data, edited by global volunteers, can also be used for acquiring geospatial data related to islands. There are several benefits of using the OSM data [23]. First, it is being edited by global volunteers and thus has a global coverage. Second, the OSM data can also be freely acquired for public use. Third, the data contains many map features (e.g., roads, buildings and land-uses); more importantly, islands (https://wiki.openstreetmap.org/wiki/Tag:place%3Disland, accessed on 27 April 2021) and islets (https://wiki.openstreetmap.org/wiki/Tag:place%3Dislet, accessed on 27 April 2021) data can also be acquired directly from OSM.
Despite these available island datasets (GSV and OSM), to the best of our knowledge, few studies have paid attention to the data quality of these datasets. The GSV dataset has only been validated using visual inspection rather than quantitative assessment [22]. Many concerns have also been raised about the data quality of OSM because the data was edited by global volunteers from different countries [24], and of different ages and educational backgrounds [25]. Although extensive studies have been conducted to assess OSM data quality in terms of roads [26,27,28], buildings [29,30,31], land-cover, and land-uses [32,33,34], there is still a lack of research assessing OSM data quality in terms of islands and/or islets.

1.2. Aim and Contributions

Therefore, the purpose of our study is to assess and compare the data quality of two existing island datasets (GSV and OSM). Moreover, this study has two main contributions.
(1)
Different measures (including accuracy, completeness and shape complexity) were designed for assessing the data quality of island datasets.
(2)
Both the GSV and OSM datasets were not only assessed but also compared, in order to investigate which performed the best.

1.3. Organization

The paper is structured as follows: Section 2 describes the study area and data. Section 3 presents the designed measures that were used to assess and compare GSV and OSM island datasets. Section 4 reports the results and analyses. Section 5 and Section 6 comprise the discussion and conclusion, respectively.

2. Study Area and Data

2.1. Study Area

Four 100 × 100 km2 regions were chosen as the study areas (as shown in Figure 1). These regions were selected for several reasons: First, they are located in different geographical regions of the world, namely the Atlantic Ocean, the Arctic Ocean, the Indian Ocean and the Pacific Ocean (see Appendix A). Second, the size and pattern of the islands vary between the different regions, as indicated in Table 1. For instance, the islands in study area II are relatively large, while those in study area III are much smaller. In contrast, the islands in study areas I and IV show a combination of different sizes. Third, and most important, four different study areas were chosen to minimize any potential bias in the analysis.

2.2. Data

Two categories of open island datasets (global shoreline vector and OpenStreetMap) were used for the analysis (Table 2).
  • Global shoreline vector (GSV): The dataset is a 30-m spatial resolution global shoreline vector, which was produced based on annual composites of 2014 Landsat satellite images [22]. This dataset includes 340,691 islands in total, divided into three classes, i.e., 5 continental mainlands, 21,818 big islands greater than 1 km2 and 318,868 islands smaller than 1 km2 (Table 2). The dataset was acquired on 27 April 2021 from the website https://rmgsc.cr.usgs.gov/gie.
  • OpenStreetMap (OSM): The dataset were freely acquired from Planet OSM: https://planet.openstreetmap.org/ (accessed on 13 February 2023), acquired in 2021. The platform provides all OSM data on a global scale. Each object in OSM has at least one tag (consisting of a key and a value) to describe the attribute of this object. As an example, if an OSM object is tagged with “place (key) = islet (value)”, it means that this object is a small island in the sea. Moreover, in our study, three different tags (i.e., natural = coastline, place = island and place = islet) relating to islands were extracted from OSM data (Table 2). In addition, the extracted data, originally saved in a pdf format, were converted into a shapefile format for the analysis, because the latter format can be processed by most geographic information system (GIS) software (e.g., ArcGIS and QGIS).

3. Methodology

The two island datasets (GSV and OSM) were evaluated based on three aspects: (1) accuracy, (2) completeness, and (3) boundary complexity. This is because accuracy and completeness are quality measures defined by ISO (International Organization for Standardization 2013 [35]) that are widely used to assess the quality of various types of geospatial data, such as roads, buildings, and land-cover/land use. Furthermore, geometry irregularity or complexity is often analyzed when investigating coastlines [36]. Specifically, the workflow is shown in Figure 2, and the corresponding evaluation measures are introduced below.

3.1. Accuracy

Accuracy is used to measure whether the islands in each open dataset are represented correctly. As a reference island dataset is not freely available, the basis of our evaluation approach is to compare each open dataset with a set of sampling points that were visually interpreted from Google Earth. Specifically,
  • First, a set of sampling points with an interval of 2 km was acquired from each study area, resulting in a total of 2500 sampling points for each study area.
  • Next, the reference classification of each sampling point (either ‘island’ or ‘non-island’) was visually interpreted from the corresponding satellite image in Google Earth, which was taken around the year 2021.
  • Subsequently, all sampling points for each study area were overlaid on each open dataset (e.g., GSV or OSM) to determine the predicted classification (either ‘island’ or ‘non-island’) of each sampling point. Specifically, if a sampling point was located within the polygon of an island, it was classified as ‘island’; otherwise, it was classified as ‘non-island’.
  • Finally, the predicted classification of each sampling point was compared with the corresponding reference classification, using four different measures: overall accuracy (OA), precision, recall, and F1. These measures were chosen because they have been widely used to evaluate the performance of classification problems [37,38].
O A = T P + T N T P + F P + T N + F N × 100 %
P r e c i s i o n = F P F P + T N × 100 %
R e c a l l = T P T P + F N × 100 %
F 1 = 2 × P r e c i s i o n × R e c a l l P r e c i s i o n + R e c a l l × 100 %
where T P denotes the number of sampling points that were identified as ‘island’ in both the open dataset and Google Earth; T N denotes the number of sampling points that were identified as ‘non-island’ in both the open dataset and Google Earth; F P denotes the number of sampling points that were classified as ‘island’ in the open dataset but interpreted as ‘non-island’ in Google Earth; and F N denotes the number of sampling points that were classified as ‘non-island’ in the open dataset but interpreted as ‘island’ in Google Earth.

3.2. Completeness

Completeness indicates how well a region has been mapped. As a freely available reference island dataset was not available, it was impossible to calculate the actual completeness. As an alternative, we compared the relative differences between the two island datasets (GSV and OSM). Specifically, two measures (area completeness and count completeness) were used to assess completeness [29]. The two measures (called C a r e a and C c o u n t ) are described as follows.
C a r e a = A O S M A G S V × 100 %
C c o u n t = N O S M N G S V × 100 %
where A G S V and A O S M denote the total areas of islands in the GSV and OSM datasets, respectively, and N O S M and N G S V denote the total number of islands in the OSM and GSV datasets, respectively.
Furthermore, in order to investigate how small an island can be identified using these datasets, we compared the number of islands in the two datasets (GSV and OSM) in terms of different area intervals (i.e., 0–102, 102–302, 302–502, 502–1002, 1002–10002 and >10002 m2). Additionally, we investigated whether the islands identified in these datasets actually exist or were incorrectly identified, which was achieved through visual interpretation using Google Earth.

3.3. Shape Complexity

The shape complexity denotes the complexity of an object’s shape or boundary. For this study, we analyzed the shape complexity of islands in each dataset to investigate which dataset (i.e., GSV or OSM) has more details. Specifically, we employed the box-counting method which has been widely applied to analyze the shape or boundary complexity of coastlines, to calculate this measure [36,39].
The main steps for using the box-counting method are:
  • First, the islands in each dataset (GSV or OSM), which were originally represented by polygons, were converted into lines (or boundaries).
  • Then, the lines or boundaries in each dataset and each study area (i.e., I, II, III or IV) were respectively overlaid with regular grids of different sizes (i.e., 10, 30, 50, 100, 300, 500 and 1000 m, Figure 3). For each grid cell, not only was the size of the grid cell (r) recorded but also the number of grid cells that intersected with a line or boundary (Nr) was calculated.
  • After that, the natural logarithms of each pair of r and Nr were calculated. That means each pair of r and Nr was converted into In(r) and In(Nr), respectively.
  • Lastly, a linear function was used to fit multiple pairs of In(r) and In(Nr) of different grid sizes, that is,
In(Nr) = −Dln(r) + lnC
where C denotes a constant; and D denotes the fractal dimension. Commonly, the larger the fractal dimension, the more details of islands in a dataset.

4. Results and Analyses

4.1. Results of Accuracy

First of all, Figure 4 shows the evaluation results of the two island datasets (GSV and OSM), in terms of accuracy. The specific values are listed in Appendix B.
We can see from Figure 4 and Appendix B that:
(1)
The overall accuracy (OA) is almost 100% for both datasets (GSV and OSM), and even the lowest OA value is higher than 98%. This indicates the effectiveness of using the two datasets for mapping islands. Moreover, for two out of the four study areas, the OA is slightly higher for the GSV dataset than for the OSM dataset. However, this is the opposite case for study area IV, which suggests that performance may vary with different study areas.
(2)
In most cases, precision is higher for the OSM dataset. Taking study area III as an example, precision is 93.40% for the OSM dataset, which is higher than that (90.83%) for the GSV dataset. Despite this, most precision values are higher than 90%, indicating that most sampling points identified as ‘island’ in these island datasets have also been classified as ‘island’ when referring to Google Earth.
(3)
Unlike precision, recall values are higher for the GSV dataset than for the OSM dataset in most cases. Taking study area I as an example, the recall value is only 81.82% for the OSM dataset, which is much lower than that (90%) for the GSV dataset. Despite this, all recall values are higher than 90%, indicating that most sampling points classified as ‘island’ in Google Earth have also been identified as ‘island’ in these island datasets.
(4)
The best performance of the two island datasets also varies with different study areas in terms of F1. Specifically, the GSV dataset performs better than the OSM dataset for study areas I and II, but this is the opposite case for study area IV.
Further, two examples are used to illustrate the performance of two island datasets (GSV and OSM) by overlapping them with satellite images in Google Earth (Figure 5). Figure 5a shows that the OSM dataset provides a more precise identification of the island than the GSV dataset. For instance, the yellow sampling point in Figure 5a was visually interpreted as ‘non-island’, but the GSV dataset identified it as ‘island’. Figure 5b shows that the GSV dataset yields a more complete identification of the island compared to the OSM dataset. For instance, the yellow sampling point in Figure 5b was visually interpreted as ‘island’ in Google Earth, but the OSM dataset identified it as ‘non-island’.

4.2. Results of Completeness

Next, Figure 6 shows the area completeness and the count completeness, respectively, for the four study areas.
In terms of area completeness (Figure 6a), most of the values are close to 100%, indicating that the total areas are almost the same for these two island datasets. Nevertheless, the area completeness is relatively low (89%) for study area I but relatively high (110%) for study area III, respectively. This indicates that in study area I, the total areas are relatively larger for the GSV dataset, but in study area III, the total areas are relatively smaller for this dataset.
In terms of count completeness, the values varied dramatically from 113% to 183%. More importantly, all the values are higher than 100%, indicating that there are more islands in the OSM dataset than in the GSV dataset. Furthermore, Figure 7 also shows the number of islands in these two datasets for each study area. Unlike for Figure 6, the number was counted by taking different area intervals (0–102, 102–302, 302–502, 502–1002, 1002–10002, and >10002 (m2)) into consideration.
Figure 7 shows that the number of islands increased from the interval of 0–102 to 1002–10002 (m2) along with an increase in area intervals. In most cases, the number of islands is much higher in the OSM dataset than in the GSV dataset, especially for area intervals between 0 and 1002 (m2). This indicates that there are many more small islands in the OSM dataset.
Furthermore, we investigated whether the islands in the two datasets (GSV and OSM) could also be visually interpreted from satellite images in Google Earth (Table 3). The results are reported considering different area intervals. Table 3 shows that most of the islands in each dataset can also be found in Google Earth. For instance, for study area I and an area interval of 0–102 (m2), 25 islands were identified in the OSM dataset, 23 of which can be found in Google Earth. The results indicate the reliability of using these datasets for island mapping.
Despite this advantage, flaws may also be found. Specifically, a few islands, either in the GSV dataset (Figure 8a) or in the OSM dataset (Figure 8b), cannot be found in Google Earth, indicating errors in these two island datasets. Additionally, the number of islands in the OSM dataset is higher, probably due to two reasons. On the one hand, more islands with a relatively small area (e.g., <1002 (m2)) can be identified in the OSM dataset. On the other hand, two or multiple small islands have been mapped as integrated into one in the GSV dataset (Figure 8c,d).

4.3. Results of Shape Complexity

Figure 9 further plots the results of the box-counting method. The corresponding fractal dimensions for the two island datasets (OSM and GSV) and for the four study areas (I, II, III and IV) are also provided.
Figure 9 shows that the fractal dimension varies with different study areas, probably because the number of islands varies in different study areas (see Table 2). For the same study area, the fractal dimensions are almost the same for the two island datasets (GSV and OSM), somehow indicating the similarity between them. In most cases, the fractal dimension is a bit larger for the OSM dataset than for the GSV dataset. This indicates that the boundary is relatively more complex for the OSM dataset (or the island data has more details), although this is not the case for study area II. In order to further understand the results, Figure 10 shows two examples. Each island dataset in these examples is overlapped not only with satellite images in Google Earth (Figure 10a,d) but also with regular grids (Figure 10b,c,e,f).
We can see from Figure 10 that, generally, the OSM dataset includes more details, probably because it is more precise (Figure 4). As an example, the two islands in Figure 10a can be seen on Google Earth and can also be identified in the OSM dataset. However, in the GSV dataset, only a single larger island (which includes the two relatively small ones) can be identified. Thus, relatively more grid cells (200) overlap with the OSM dataset than with the GSV dataset (195).
In contrast, in Figure 10d–f, relatively more grid cells (143) overlap with the GSV dataset than with the OSM dataset (109), probably because the perimeter of the island in the GSV dataset is longer than that in the OSM dataset. However, the island in the OSM dataset still appears to be more precise when visually interpreting the satellite image on Google Earth.

5. Discussion

5.1. Comparing between GSV and OSM Datasets

This study assessed the data quality of two categories of ocean island datasets (GSV and OSM). This was achieved not only by comparing each dataset with a set of reference sampling points visually interpreted from Google Earth in terms of accuracy but also by comparing these two datasets in terms of completeness and shape complexity. Therefore, it is interesting to investigate which dataset can perform better. Our results showed that:
From an accuracy aspect, the OSM dataset performs better than the GSV dataset in terms of precision, but the GSV dataset performs better than the OSM dataset in terms of recall. However, in terms of overall accuracy (OA) and F1, the best performance for using the GSV and OSM datasets varies among the different study areas.
From the completeness aspect, the area completeness is close to 100%, indicating that the total areas of the GSV and OSM datasets are almost the same. However, the count completeness is much larger than 100%, indicating that the number of islands acquired from the OSM dataset is much more than those acquired from the GSV dataset. Moreover, we also found that more small islands (e.g., <100 × 100 m2) can be acquired from the OSM dataset than from the GSV dataset.
From the shape complexity, the fractal dimension calculated based on the OSM dataset is also slightly larger than that calculated based on the GSV dataset, indicating that in most cases, the boundary of islands in the OSM dataset has relatively more details.
Therefore, we argue that the OSM dataset performs better than the GSV dataset for most of the measures (i.e., precision, completeness, and shape complexity). This is probably because the two datasets were produced based on different spatial resolutions of remote sensing data (Figure 11), that is, the GSV dataset was produced based on Landsat 7, which has a spatial resolution of 30 m [22]. On the other hand, the OSM dataset was edited by global volunteers based on Bing satellite map, which has a much higher spatial resolution (0.5 m [32]). Thus, the islands in the OSM dataset are represented with more details. Nevertheless, the GSV dataset performs better than the OSM dataset in terms of recall. Thus, the GSV dataset can still be used as a supplement, especially when the islands of a region have not been mapped well in OSM.

5.2. Applications

As both categories of island datasets (GSV and OSM) perform well in terms of accuracy (98% or above) and F1 (0.95 or above), there are several potential applications for them. First, these datasets can be used to map the spatial pattern of ocean islands not only in a region but across the globe, as both are freely available on a global scale. Moreover, this type of analysis has benefits for ship routing planning [40,41] and marine protected areas planning [42,43,44].
Furthermore, the OSM dataset is continuously updated by global volunteers on a minute-by-minute basis (https://wiki.openstreetmap.org/wiki/Osmupdate, accessed on 13 February 2023). It is therefore feasible to acquire historical data of islands in OSM, which can be used to analyze the variation of islands over a long time series. This type of analysis is essential to monitor SDG-related indicators, which may be useful in achieving sustainable development of the marine environment (Virto 2018).

5.3. Limitations

Despite the advantages and applications of using the two island datasets, there are several limitations to this study. First, the accuracy-relevant measures were analyzed by comparing them to a set of reference sampling points that were visually interpreted from Google Earth. On the one hand, the coastline or boundary of an island may vary with different years and even different seasons. We did not consider such variation because very high-resolution satellite images (e.g., 1 m or higher) are not freely available. Although Google Earth provides high-resolution satellite images, the available years are limited and inconsistent in different study areas. On the other hand, each sampling point was only identified as ‘island’ (above sea surface) or ‘non-island’ (below sea surface). We did not divide these sampling points into more detailed classes (e.g., islands and reefs). This is because it is difficult to distinguish between the different classes through visual interpretation from Google Earth. Nevertheless, in future work, it would still be worthwhile to use other data sources to assess the data quality of these island datasets by considering more classes.
Second, we compared the GSV and OSM datasets in terms of completeness and shape complexity measures. This is because there is no corresponding reference dataset. Thus, we cannot quantitatively evaluate how complete each island dataset is or how much difference there is in shape complexity between each island dataset and a reference dataset. Moreover, only 2500 samplings were collected for each study area because visually interpreting the type of each sampling point from Google Earth is still a time-consuming and labor-intensive task. However, in future work, more sampling points should be gathered to enhance the reliability of our results.
Last but not least, in this study, only four 100 × 100 km2 regions were chosen as the study areas. This is also because it is costly to determine the types of a large number of sampling points (10,000 in total). However, both the GSV and OSM datasets are freely available at a global scale. Therefore, in further work, it would be worthwhile to apply our analytical framework to other regions across the globe to investigate whether consistent results can be found.

6. Conclusions

This study assessed two categories of open island datasets (GSV and OSM) using three types of measures: accuracy, completeness, and shape complexity. Specifically, in terms of accuracy, each island dataset was compared with a set of reference sampling points that were visually interpreted with Google Earth, and four different measures, including overall accuracy (OA), precision, recall, and F1, were calculated. In terms of completeness, both area completeness and count completeness were used to compare the two island datasets. Different sizes of islands were also considered during the comparison. In terms of shape complexity, the box-counting method was employed to calculate the fractal dimension of each study area, and then the fractal dimensions of these two island datasets were compared. Four 100 × 100 km2 regions across the globe were included as the study areas for the analysis. The results showed that:
(1)
The best performance between the two island datasets (GSV and OSM) varied with different study areas in terms of OA and F1. In most cases, the OSM dataset performed better in terms of precision, but GSV performed better with respect to recall.
(2)
Area completeness is close to 100%, indicating that both the GSV and OSM datasets are similar in terms of the total area of islands. However, count completeness was much higher than 100%, indicating that the OSM dataset is larger than the GSV dataset in terms of the total number of islands. Likewise, more small islands can be acquired from the OSM dataset.
(3)
In most cases, the OSM dataset has a higher value than the GSV dataset in terms of shape complexity (or fractal dimension), indicating that the OSM dataset has more details in terms of the island boundary or coastline.
We concluded that both the GSV and OSM datasets are effective, especially in terms of OA and F1, and the OSM dataset can identify more small islands and provide more details. Despite these advantages, in future work, other high-resolution remote sensing data could be used to assess the data quality of the two island datasets, especially by taking different years and seasons into consideration. Other reference datasets may also be acquired as benchmarks to carry out quantitative assessments (e.g., in terms of completeness and shape complexity). Lastly, other regions across the globe should also be involved in the analysis to verify our results.

Author Contributions

Conceptualization, Qi Zhou and Lihua Zhang; Formal analysis, Yijun Chen and Shenxin Zhao; Writing—original draft, Qi Zhou and Yijun Chen. All authors have read and agreed to the published version of the manuscript.

Funding

The project was supported by the Director Fund of the International Research Center of Big Data for Sustainable Development Goals (Grant No. CBAS2022DF010) and the National Natural Science Foundation of China (Grant No. 41771428; 4207040449).

Data Availability Statement

Related data are available upon reasonable request.

Acknowledgments

The authors thank all anonymous reviewers and the editor for their valuable comments and suggestions that have helped improve this paper substantially.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Figure A1. Locations of the four study areas.
Figure A1. Locations of the four study areas.
Ijgi 12 00168 g0a1

Appendix B

Table A1. The evaluation results of two island datasets (GSV and OSM), in terms of the accuracy measures.
Table A1. The evaluation results of two island datasets (GSV and OSM), in terms of the accuracy measures.
Study AreaOverall Accuracy (%)Precision (%)Recall (%)F1
GSVOSMGSVOSMGSVOSMGSVOSM
I99.6499.4895.2498.1890.9181.820.930.89
II98.4098.2496.6596.7598.5497.930.980.97
III99.9699.96100.00100.0093.3393.330.970.97
IV99.4899.6090.8393.4097.0697.060.940.95

References

  1. Royle, S.A. A human geography of islands. Geography 1989, 74, 106–116. [Google Scholar]
  2. Spalding, M.D.; Ruffo, S.; Lacambra, C.; Meliane, I.; Hale, L.Z.; Shepard, C.C.; Beck, M.W. The role of ecosystems in coastal protection: Adapting to climate change and coastal hazards. Ocean Coast. Manag. 2014, 90, 50–57. [Google Scholar] [CrossRef]
  3. Mimura, N. Vulnerability of island countries in the South Pacific to sea level rise and climate change. Clim. Res. 1999, 12, 137–143. [Google Scholar] [CrossRef]
  4. Harter, D.E.; Irl, S.D.; Seo, B.; Steinbauer, M.J.; Gillespie, R.; Triantis, K.A.; Fernández-Palacios, J.-M.; Beierkuhnlein, C. Impacts of global climate change on the floras of oceanic islands–Projections, implications and current knowledge. Perspect. Plant Ecol. Evol. Syst. 2015, 17, 160–183. [Google Scholar] [CrossRef]
  5. Amores, A.; Marcos, M.; Le Cozannet, G.; Hinkel, J. Coastal flooding and mean sea-level rise allowances in atoll island. Sci. Rep. 2022, 12, 1281. [Google Scholar] [CrossRef]
  6. Pelling, M.; Uitto, J.I. Small island developing states: Natural disaster vulnerability and global change. Glob. Environ. Change Part B Environ. Hazards 2001, 3, 49–62. [Google Scholar] [CrossRef]
  7. Noy, I. Natural disasters in the Pacific Island Countries: New measurements of impacts. Nat. Hazards 2016, 84 (Suppl. S1), 7–18. [Google Scholar] [CrossRef]
  8. Hasan, M.H. Destruction of a Holothuria scabra population by overfishing at Abu Rhamada Island in the Red Sea. Mar. Environ. Res. 2005, 60, 489–511. [Google Scholar] [CrossRef]
  9. Wairiu, M. Land degradation and sustainable land management practices in Pacific Island Countries. Reg. Environ. Change 2017, 17, 1053–1064. [Google Scholar] [CrossRef]
  10. Zhang, Y.; Li, D.; Fan, C.; Xu, H.; Hou, X. Southeast Asia island coastline changes and driving forces from 1990 to 2015. Ocean Coast. Manag. 2021, 215, 105967. [Google Scholar] [CrossRef]
  11. Virto, L.R. A preliminary assessment of the indicators for Sustainable Development Goal (SDG) 14 “Conserve and sustainably use the oceans, seas and marine resources for sustainable development”. Mar. Policy 2018, 98, 47–57. [Google Scholar] [CrossRef]
  12. Dong, Y.; Liu, Y.; Hu, C.; Xu, B. Coral reef geomorphology of the Spratly Islands: A simple method based on time-series of Landsat-8 multi-band inundation maps. ISPRS J. Photogramm. Remote Sens. 2019, 157, 137–154. [Google Scholar] [CrossRef]
  13. Immordino, F.; Barsanti, M.; Candigliota, E.; Cocito, S.; Delbono, I.; Peirano, A. Application of Sentinel-2 multispectral data for habitat mapping of Pacific islands: Palau Republic (Micronesia, Pacific Ocean). J. Mar. Sci. Eng. 2019, 7, 316. [Google Scholar] [CrossRef] [Green Version]
  14. Lyons, M.B.; MRoelfsema, C.; VKennedy, E.; MKovacs, E.; Borrego-Acevedo, R.; Markey, K.; Roe, M.; Yuwono, D.M.; Harris, D.L.; Phinn, S.R.; et al. Mapping the world’s coral reefs using a global multiscale earth observation framework. Remote Sens. Ecol. Conserv. 2020, 6, 557–568. [Google Scholar] [CrossRef] [Green Version]
  15. Zhuang, Q.; Zhang, J.; Cheng, L.; Chen, H.; Song, Y.; Chen, S.; Chu, S.; Dongye, S.; Li, M. Framework for Automatic Coral Reef Extraction Using Sentinel-2 Image Time Series. Mar. Geod. 2022, 45, 195–231. [Google Scholar] [CrossRef]
  16. Mikelsons, K.; Wang, M.; Wang, X.L.; Jiang, L. Global land mask for satellite ocean color remote sensing. Remote Sens. Environ. 2021, 257, 112356. [Google Scholar] [CrossRef]
  17. Král, K.; Pavliš, J. The first detailed land-cover map of Socotra Island by Landsat/ETM+ data. Int. J. Remote Sens. 2006, 27, 3239–3250. [Google Scholar] [CrossRef]
  18. Révillion, C.; Attoumane, A.; Herbreteau, V. Homisland-IO: Homogeneous land use/land cover over the Small Islands of the Indian Ocean. Data 2019, 4, 82. [Google Scholar] [CrossRef] [Green Version]
  19. Chen, H.; Chen, C.; Zhang, Z.; Lu, C.; Wang, L.; He, X.; Chu, Y.; Chen, J. Changes of the spatial and temporal characteristics of land-use landscape patterns using multi-temporal Landsat satellite data: A case study of Zhoushan Island, China. Ocean Coast. Manag. 2021, 213, 105842. [Google Scholar] [CrossRef]
  20. Holdaway, A.; Ford, M.; Owen, S. Global-scale changes in the area of atoll islands during the 21st century. Anthropocene 2021, 33, 100282. [Google Scholar] [CrossRef]
  21. Leihy, R.I.; Duffy, G.A.; Nortje, E.; Chown, S.L. High resolution temperature data for ecological research and management on the Southern Ocean Islands. Sci. Data 2018, 5, 180177. [Google Scholar] [CrossRef] [Green Version]
  22. Sayre, R.; Noble, S.; Hamann, S.; Smith, R.; Wright, D.; Breyer, S.; Butler, K.; Van Graafeiland, K.; Frye, C.; Karagulle, D.; et al. A new 30 meter resolution global shoreline vector and associated global islands database for the development of standardized ecological coastal units. J. Oper. Oceanogr. 2019, 12 (Suppl. S2), S47–S56. [Google Scholar] [CrossRef] [Green Version]
  23. Mooney, P.; Minghini, M. A review of OpenStreetMap data. In Mapping and the Citizen Sensor; Ubiquity Press: London, UK, 2017; pp. 37–59. [Google Scholar]
  24. Neis, P.; Zipf, A. Analyzing the contributor activity of a volunteered geographic information project—The case of OpenStreetMap. ISPRS Int. J. Geo-Inf. 2012, 1, 146–165. [Google Scholar] [CrossRef] [Green Version]
  25. Neis, P.; Zielstra, D. Recent developments and future trends in volunteered geographic information research: The case of OpenStreetMap. Future Internet 2014, 6, 76–106. [Google Scholar] [CrossRef] [Green Version]
  26. Girres, J.F.; Touya, G. Quality assessment of the French OpenStreetMap dataset. Trans. GIS 2010, 14, 435–459. [Google Scholar] [CrossRef]
  27. Barrington-Leigh, C.; Millard-Ball, A. The world’s user-generated road map is more than 80% complete. PLoS ONE 2017, 12, e0180698. [Google Scholar] [CrossRef] [Green Version]
  28. Borkowska, S.; Pokonieczny, K. Analysis of OpenStreetMap data quality for selected counties in Poland in terms of sustainable development. Sustainability 2022, 14, 3728. [Google Scholar] [CrossRef]
  29. Tian, Y.; Zhou, Q.; Fu, X. An analysis of the evolution, completeness and spatial patterns of OpenStreetMap building data in China. ISPRS Int. J. Geo-Inf. 2019, 8, 35. [Google Scholar] [CrossRef] [Green Version]
  30. Zhang, Y.; Zhou, Q.; Brovelli, M.A.; Li, W. Assessing OSM building completeness using population data. Int. J. Geogr. Inf. Sci. 2022, 36, 1443–1466. [Google Scholar] [CrossRef]
  31. Zhou, Q.; Zhang, Y.; Chang, K.; Brovelli, M.A. Assessing OSM building completeness for almost 13,000 cities globally. Int. J. Digit. Earth 2023, 15, 2400–2421. [Google Scholar] [CrossRef]
  32. Viana, C.M.; Encalada, L.; Rocha, J. The value of OpenStreetMap historical contributions as a source of sampling data for multi-temporal land use/cover maps. ISPRS Int. J. Geo-Inf. 2019, 8, 116. [Google Scholar] [CrossRef] [Green Version]
  33. Wang, S.; Zhou, Q.; Tian, Y. Understanding completeness and diversity patterns of OSM-based land-use and land-cover dataset in China. ISPRS Int. J. Geo-Inf. 2020, 9, 531. [Google Scholar] [CrossRef]
  34. Zhou, Q.; Wang, S.; Liu, Y. Exploring the accuracy and completeness patterns of global land-cover/land-use data in OpenStreetMap. Appl. Geogr. 2022, 145, 102742. [Google Scholar] [CrossRef]
  35. ISO 19157:2013; Geographic Information—Data Quality. International Organization for Standardization: Geneva, Switzerland, 2013.
  36. Husain, A.; Reddy, J.; Bisht, D.; Sajid, M. Fractal dimension of coastline of Australia. Sci. Rep. 2021, 11, 6304. [Google Scholar] [CrossRef] [PubMed]
  37. Liao, Y.; Zhou, Q.; Jing, X. A comparison of global and regional open datasets for urban greenspace mapping. Urban For. Urban Green. 2021, 62, 127132. [Google Scholar] [CrossRef]
  38. Zhou, Q.; Jing, X. Evaluation and Comparison of Open and High-Resolution LULC Datasets for Urban Blue Space Mapping. Remote Sens. 2022, 14, 5764. [Google Scholar] [CrossRef]
  39. Xu, J.; Zhang, Z.; Zhao, X.; Wen, Q.; Zuo, L.; Wang, X.; Yi, L. Spatial and temporal variations of coastlines in northern China (2000–2012). J. Geogr. Sci. 2014, 24, 18–32. [Google Scholar] [CrossRef]
  40. Tsou, M.C. Multi-target collision avoidance route planning under an ECDIS framework. Ocean Eng. 2016, 121, 268–278. [Google Scholar] [CrossRef]
  41. Gao, M.; Shi, G.; Li, W.; Wang, Y.; Liu, D. An improved genetic algorithm for island route planning. Procedia Eng. 2017, 174, 433–441. [Google Scholar] [CrossRef]
  42. Merrifield, M.S.; McClintock, W.; Burt, C.; Fox, E.; Serpa, P.; Steinback, C.; Gleason, M. MarineMap: A web-based platform for collaborative marine protected area planning. Ocean Coast. Manag. 2013, 74, 67–76. [Google Scholar] [CrossRef]
  43. Gaymer, C.F.; Stadel, A.V.; Ban, N.C.; Cárcamo, P.F.; Ierna, J., Jr.; Lieberknecht, L.M. Merging top-down and bottom-up approaches in marine protected areas planning: Experiences from around the globe. Aquatic Conservation Mar. Freshw. Ecosyst. 2014, 24, 128–144. [Google Scholar] [CrossRef]
  44. Noble, M.M.; Harasti, D.; Pittock, J.; Doran, B. Using GIS fuzzy-set modelling to integrate social-ecological data to support overall resilience in marine protected area spatial planning: A case study. Ocean Coast. Manag. 2021, 212, 105745. [Google Scholar] [CrossRef]
Figure 1. The GSV and OSM datasets of four study areas: I (a,b); II (c,d); III (e,f); and IV (g,h).
Figure 1. The GSV and OSM datasets of four study areas: I (a,b); II (c,d); III (e,f); and IV (g,h).
Ijgi 12 00168 g001
Figure 2. The workflow of evaluation method.
Figure 2. The workflow of evaluation method.
Ijgi 12 00168 g002
Figure 3. The principle of the box-counting method, using different grid sizes, i.e., (a) 30 m; (b) 50 m; and (c) 100 m.
Figure 3. The principle of the box-counting method, using different grid sizes, i.e., (a) 30 m; (b) 50 m; and (c) 100 m.
Ijgi 12 00168 g003
Figure 4. The evaluation results of two island datasets and four regions (I–IV), in terms of overall accuracy (a), precision (b), recall (c), and F1 (d).
Figure 4. The evaluation results of two island datasets and four regions (I–IV), in terms of overall accuracy (a), precision (b), recall (c), and F1 (d).
Ijgi 12 00168 g004
Figure 5. Illustrating the performances of two island datasets (GSV and OSM) by overlapping them with satellite images (a,b) in Google Earth.
Figure 5. Illustrating the performances of two island datasets (GSV and OSM) by overlapping them with satellite images (a,b) in Google Earth.
Ijgi 12 00168 g005
Figure 6. Results of the area completeness and count completeness for the four study areas (I, II, III and IV).
Figure 6. Results of the area completeness and count completeness for the four study areas (I, II, III and IV).
Ijgi 12 00168 g006
Figure 7. The number of islands (x-axis) in the two datasets (GSV and OSM) and for four study areas, considering different area intervals of islands (y-axis).
Figure 7. The number of islands (x-axis) in the two datasets (GSV and OSM) and for four study areas, considering different area intervals of islands (y-axis).
Ijgi 12 00168 g007
Figure 8. Overlapping the two island datasets with corresponding satellite images (ad) in Google Earth, in order to understand the results in Table 3.
Figure 8. Overlapping the two island datasets with corresponding satellite images (ad) in Google Earth, in order to understand the results in Table 3.
Ijgi 12 00168 g008
Figure 9. Results of the box-counting method and corresponding fractal dimensions, in terms of the two island datasets (OSM and GSV) and the four study areas (I, II, III and IV).
Figure 9. Results of the box-counting method and corresponding fractal dimensions, in terms of the two island datasets (OSM and GSV) and the four study areas (I, II, III and IV).
Ijgi 12 00168 g009
Figure 10. Two examples (ac and df) are used to understand the results in Figure 9. For each example, both datasets are superimposed on either a Google satellite image or a regular grid.
Figure 10. Two examples (ac and df) are used to understand the results in Figure 9. For each example, both datasets are superimposed on either a Google satellite image or a regular grid.
Ijgi 12 00168 g010
Figure 11. Overlapping the islands in the GSV and OSM datasets with Landsat 7 (a) and Bing satellite map (b), respectively.
Figure 11. Overlapping the islands in the GSV and OSM datasets with Landsat 7 (a) and Bing satellite map (b), respectively.
Ijgi 12 00168 g011
Table 1. The statistics of islands in the four study areas.
Table 1. The statistics of islands in the four study areas.
Study AreaGeographical RegionTotal Number of IslandsAverage Size of Islands (m2)
GSVOSMGSVOSM
IAtlantic Ocean4177646.9 × 1053.4 × 105
IIArctic Ocean56735.9 × 1074.5 × 107
IIIIndian Ocean1511722.5 × 1052.4 × 105
IVPacific Ocean1151363.7 × 1063.0 × 107
Table 2. The attributes of the two island datasets (GSV and OSM).
Table 2. The attributes of the two island datasets (GSV and OSM).
DatasetType/TagDefinition
Global Shoreline Vector (GSV)Continental mainlandsNorthern America, Southern America, Africa, Australia, Eurasia
Large islands Islands that are larger than 1 km2
Small islands Islands that are smaller than 1 km2
OpenStreetMap (OSM)natural = coastlineThe mean high water springs line along the coastline at the edge of the sea
place = islandAny piece of land that is completely surrounded by water and isolated from other significant landmasses
place = isletAny very small island
Table 3. The reliability of islands in the two datasets (GSV and OSM), in consideration of different area intervals *.
Table 3. The reliability of islands in the two datasets (GSV and OSM), in consideration of different area intervals *.
Area Interval(m2)IIIIIIIV
GSVOSMGSVOSMGSVOSMGSVOSM
0–1020/023/250/00/00/00/00/00/0
102–30220/23113/1160/00/01/26/70/01/1
302–50227/28151/1540/00/06/79/90/07/7
502–1002123/126178/1800/07/912/1312/139/924/24
1002–10002216/216267/26729/3039/39124/124137/13779/8784/84
>1000224/2422/2226/2625/255/56/619/1920/20
* The number to the left of the “/”: represents the count of islands present not only in the dataset but also visible on Google Earth; the number to the right of the “/”: indicates the total number of islands identified in this dataset.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Chen, Y.; Zhao, S.; Zhang, L.; Zhou, Q. Quality Assessment of Global Ocean Island Datasets. ISPRS Int. J. Geo-Inf. 2023, 12, 168. https://doi.org/10.3390/ijgi12040168

AMA Style

Chen Y, Zhao S, Zhang L, Zhou Q. Quality Assessment of Global Ocean Island Datasets. ISPRS International Journal of Geo-Information. 2023; 12(4):168. https://doi.org/10.3390/ijgi12040168

Chicago/Turabian Style

Chen, Yijun, Shenxin Zhao, Lihua Zhang, and Qi Zhou. 2023. "Quality Assessment of Global Ocean Island Datasets" ISPRS International Journal of Geo-Information 12, no. 4: 168. https://doi.org/10.3390/ijgi12040168

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop