Next Article in Journal
Advancing Mitochondrial Metagenomics: A New Assembly Strategy and Validating the Power of Seed-Based Approach
Previous Article in Journal
Underwater Photographic Survey of Coastal Fish Community of Terra Nova Bay, Ross Sea
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Communication

Accessing the Accuracy of Citizen Science Data Based on iNaturalist Data

1
Research Institute of EcoScience, Ewha Womans University, Seoul 03760, Korea
2
Division of EcoScience, Ewha Womans University, Seoul 03760, Korea
3
Underwood International College, Yonsei University, Seoul 03722, Korea
4
SiliconCube Ltd., 54, Changeop-ro, Seongnam 13449, Korea
*
Author to whom correspondence should be addressed.
Diversity 2022, 14(5), 316; https://doi.org/10.3390/d14050316
Submission received: 28 March 2022 / Revised: 19 April 2022 / Accepted: 19 April 2022 / Published: 21 April 2022

Abstract

:
The number of science projects with citizen (volunteer) participants is increasing nowadays. Despite some advantages citizen science have which directly influence science fields, the largest weakness of citizen science is the issue of the reliability of data collected by non-specialists. In this study, we assess the reliability of data collected by citizens and identify the factors that impede the credibility. We selected two species, Pelophylax chosenicus and Dryophytes suweonensis, which have specific habitat, breeding and brumation positions. The location data for both species were collected by the global citizen engagement project “iNaturalist”. As a result, 89.3% for P. chosenicus and 37.1% for D. suweonensis of the location data were estimated to be erroneous. No difference was observed in the data accuracy between experts and citizens for D. suweonensis because the mating call is the main classification key for the species. On the other hand, a significant difference was confirmed in the case of P. chosenicus, whose external characteristics are its classification key. Our study shows that efforts to improve reliability, including appropriate species selection and survey methods, are necessary to use citizen participation data in science research.

1. Introduction

Citizen science is an expression that denotes a scientific project conducted with the participation of citizens or non-specialists and mainly includes repeated and long-term mapping and monitoring [1,2]. Although the concept of citizen science was first used by Alan Irwin in 1995, citizens have been involved in scientific projects long before then [3]. The most famous example is The Christmas Bird Count (CBC) [4]. CBC is the oldest citizen-led monitoring project, which was started in 1900 by the National Audubon Society (NAS). Each year, more than 50,000 observers from 50 regions participate, and the collected data have been published over hundreds of cases [4]. In addition, there are numerous citizen-participating science projects around the world, such as Stall Catcher, eBird, iNaturalist, and Nurturing (details can be found in Scistarter.org and [2]).
Citizen science has the advantage of being able to obtain vast amounts of data over multiple regions over long periods of time [1,5,6]. Therefore, citizen science has been utilized in various research fields, such as climate change, biological monitoring, conservation, and modeling [2,7]. On the other hand, the low reliability of the data collected by non-specialists is one of the problems with citizen science [2,6]. For example, the misidentification of species or location errors caused by non-experts undermine the reliability of such data. Until recently, this reliability issue served as a major barrier to applying citizen participation data to the scientific field [6].
Several methods have been attempted to increase the accuracy and reliability of the data from citizen science. A typical example is prior education in which, with just one day of education, citizens’ ability to classify species improved [8]. In addition, when citizens selected “easy” research topics, the accuracy of the data was 17% higher than that of the data collected for “difficult” topics [8]. Moreover, research design, model building, and calibration methods were developed to ensure the reliability of the data collected by non-experts [1,9]. However, the accuracy of the data collected by citizens was still far lower than that of experts [8].
In this study, we analyze the accuracy of the data collected to evaluate the credibility of citizen science. First, the accuracy of the data uploaded to “iNaturalist”, a representative citizen science project, is analyzed. After that, the accuracy, reliability, and usability of the data are analyzed and compared with those of the experts. Beyond simply evaluating the reliability of such data, we aim to provide important evidence that could contribute to the advancement of citizen science.

2. Materials and Methods

2.1. Species Selection

The most important factor for a reliability analysis study is the selection of subjects. We focused on selecting target species whose natural and mis-recorded habitats could be differentiated to determine accuracy. Specifically, we selected species whose (1) habitat, breeding, and brumation sites are constant throughout the year and (2) can be easily observed. Among the 16 amphibians (Caudata: 4; Salientia: 12) inhabiting in South Korea [10], the black-spotted pond frog (Pelophylax chosenicus) and the Suweon tree frog (Dryophytes suweonensis) are species only found in rice paddies and related environments in the western lowlands [11,12,13,14,15]. Since the habitat of the two species can be specified as rice fields, they were selected as target species.

2.2. Distribution Data Collection

iNaturalist is a representative international social networking resource launched in 2008, to which people from all over the world can easily upload and share data, such as locations, photos, videos, and sounds of numerous creatures [16,17]. We collected location data of P. chosenicus and D. suweonensis uploaded to iNaturalist to verify the accuracy and reliability of the data collected through citizen participation [18]. We only used data that included photos and could be accurately identified. In addition, we excluded from the analysis multiple records documented at the same site. Since data are uploaded and modified in real time on information sharing sites and, therefore, subject to continuous change as research progresses, we collected and analyzed data uploaded only by April 2021. To verify the reliability of the data collected by iNaturalist, we compared them with the data published in previous research papers, including the author’s data [19].

2.3. Data Categorization and Analysis

We classified the collected location data to 4 types: (1) “Agriculture” is an environment created for paddy farming, such as paddy fields, waterways, and agricultural ponds; (2) “Forest” is an environment that includes mountains or valleys; (3) “Wetland” was used for reservoirs and rivers; and (4) “Other” included environments unrelated to amphibian habitats, such as urban centers, buildings, and the sea [20]. We analyzed the location type and frequency of the target species using the chi-square test. Prior to analyzing the differences according to the investigator, a normality test was performed on the comparison between the expert’s data and the iNaturalist’s data, and both showed a result of p < 0.001. Then, we compared the data differences by species and investigators using the Kolmogorov–Smirnov test, which is a non-parametric statistic. We used SPSS 26.0 (IBM, New York, NY, USA) to run statistical tests and the significance level was set to 0.05.

3. Results

3.1. Data Collected by Experts

The number of location points of P. chosenicus and D. suweonensis collected by experts was 79 and 162, respectively (Figure 1). In P. chosenicus, 91.1% of the locations were “Agriculture”, followed by “Wetland” at 8.9% (chi-square test, χ2 = 53.481, df = 1, p < 0.001). On the other hand, there were no “Forest” and “Other” for the location of P. chosenicus. In the case of D. suweonensis, rice fields accounted for the highest percentage of the locations at 79.6% (χ2 = 262.346, df = 3, p < 0.001), followed by “Forest” at 12.3%, “Other” at 7.4%, and “Wetland” at 1 site. For the “Other” type, 9 sites in the city center and 3 sites in the sea were included.

3.2. Data Collected by Citizens (Non-Experts)

The location data from iNaturalist were 214 points and 35 points for P. chosenicus and D. suweonensis, respectively (Figure 1). The locations of P. chosenicus were “Forest” and “Other”, accounted for equally at 43.0%, followed by “Agriculture” at 10.7% and “Wetland” at 3.3% (χ2 = 113.215, df = 3, p < 0.001). The “Other” category included 50 sites in the sea and 30 sites in the city center. In the case of D. suweonensis, “Agriculture” accounted for the highest percentage of the locations, at 62.9%, followed by “Other” (17.1%), “Forest” (14.3%), and “Wetland” (5.7%) (χ2 = 27.743, df = 3, p < 0.001). The “Other” category included four sites on the road and two sites in the city center (Figure 2).

3.3. Comparison between Expert and Citizen Data

The location data of P. chosenicus differ significantly between experts and iNaturalist (Kolmgorov–Smirnov test, n = 293, z = 6.107, p < 0.001). On the other hand, there was no difference in the location data of D. suweonensis between experts and iNaturalist data (n = 197, z = 0.867, p = 0.440).

4. Discussion

The accuracy of the data showed a clear difference depending on the target species and investigator. Without professional education, people can easily confuse P. chosenicus with P. nigromaculatus, because the two are very alike in their body size, coloration, and patterns [21]. Despite their external similarities, P. nigromaculatus inhabits forests and their surroundings, so their ecological niche is clearly distinguished from that of P. chosenicus [15]. The high percentage of “Forest” among P. chosenicus data from iNaturalist is presumably because the location data of morphologically similar true frogs were uploaded. On the other hand, the locations collected by experts reflected the habitat characteristics of P. chosenicus well [15]. It shows that expertise in the target species can have a significant impact on the accuracy of the data collected. Therefore, since morphological similarity can be a problem for species classification, actively referring to species characteristics, such as mating calls, for classification and detection—which require no prior training—will help to increase the accuracy of the data [22]. Dryophytes suweonensis is also very similar in color, shape, size, and skeleton to its related species, H. japonica, so even experts have limitations in accurately discerning the two species based on the external shape alone [23]. Although a clear difference was identified between the two species in size [24], it was difficult for citizens to distinguish the two species by size in the field. The authors of [25] discovered, first, a frog making a unique mating call, and then the species was recorded as a new species, D. suweonensis, through additional studies. In our results, there was no significant difference between the data of citizens and experts. Maybe it is due to the use of the mating call, which is an “easy” and “clear” feature for identification. In this case, data collected by citizens can be used for research. On the other hand, mis-recorded locations were also similarly identified in both citizen and expert data. In this case, we determined the mis-recorded data from both citizens and experts may be related to errors in recording and uploading, rather than in species identification.
Many location data of the two species uploaded to iNaturalist were mis-recorded. There are three possible causes for this mis-record, and the first one is related to data uploading. In iNaturalist, location data are based on where the date are uploaded. If data are uploaded from a place other than the location where the species was originally found or from its habitat, there is a possibility of mis-recording. Even in the data collected by experts, there was a difference in the general distribution of the target species, and it is possible that these errors also occurred during the uploading and recording process. Second, if the uploading was attempted from a point where the Internet environment is poor, the location of the point where the Internet conditions improved may be recorded, instead of the original location. The first and second problems can arise because many citizens are not exactly familiar with how GPS works. A study by [8] confirmed that citizens’ recording accuracy could be increased by more than 80% through simple training. Errors can also be reduced if the uploader directly records the location on the map. The third cause is the change in the environment of the recorded location points over time. For example, 11.9% of agricultural land in South Korea was converted from 21,444 km2 in 1985 to 18,888 km2 in 2000 for other uses, such as cities. In particular, the paddy area decreased by 20.9% from 1,016,000 ha in 2003 to 808,000 ha in 2014 [26,27]. At the time of discovery, the location of the two species was rice paddy field, but there is a possibility that it may have been seen as a mis-record because it is now used for urban or other purposes (Figure 3). As a solution, we propose a method that captures and stores a photo of the location at the time of data upload, including habitat, so that it can be monitored over time.
The biggest advantage of citizen science is that it allows to collect data at the level of big data. However, if reliability is not secured through sufficient verification, the collected data may be limited for use in research. Although the data were collected from two species with a known habitat and location, the reliability was insufficient for scientific application. For species whose habitat cannot be specified, reliability verification and correction of data are also limited. In the end, this problem must be solved in the process of collecting location data. The use of protocols, training, and volunteers suitable for research makes it possible to collect data at the expert level [2,6,8]. For example, professional training is required for participants, and oversight by experts is essential for the data collected afterwards. In addition, easy target selection, GPS training, statistical correction of collected data, and the use of computing tools can be a way to increase data reliability [2,6,8,9].
The merits of citizen science are valuable in terms of the quantity of data required by recent trends. Therefore, if errors that occur in data collection or uploading are identified and efforts to minimize them are continued, it will be possible to secure a large amount of data and further utilize it in various research fields.

Author Contributions

Conceptualization, K.-S.K. and J.-M.O.; Methodology, K.-S.K. and J.-M.O.; Software, K.-S.K. and J.-Y.I.; Formal Analysis, K.-S.K.; Investigation, K.-S.K. and J.-M.O.; Resources, J.-M.O. and S.-J.P.; Data Curation K.-S.K.; Writing—Original Draft Preparation, K.-S.K.; Writing—Review and Editing, K.-S.K., J.-Y.I. and S.-J.P.; Visualization, K.-S.K.; Supervision, K.-S.K.; Project Administration, K.-S.K.; Funding Acquisition, K.-S.K. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Korea Environmental Industry & Technology Institute, grant number (KEITI 2021002280003).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Acknowledgments

Thank you for the all members of Animal Behavior lab in Ewha Womans University who provided recommendation for this study.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Bonney, R.; Cooper, C.B.; Dickinson, J.; Kelling, S.; Phillips, T.; Rosenberg, K.V.; Shirk, J. Citizen science: A developing tool for expanding science knowledge and scientific literacy. BioScience 2009, 59, 977–984. [Google Scholar] [CrossRef]
  2. Silvertown, J. A new dawn for citizen science. Trends Ecol. Evol. 2009, 24, 467–471. [Google Scholar] [CrossRef] [PubMed]
  3. Irwin, A. Citizen Science: A Study of People, Expertise and Sustainable Development; Routledge: London, UK, 2002; p. 216. [Google Scholar]
  4. Dunn, E.H.; Francis, C.M.; Blancher, P.J.; Drennan, S.R.; Howe, M.A.; Lepage, D.; Robbins, C.S.; Rosenberg, K.V.; Sauer, J.R.; Smith, K.G. Enhancing the scientific value of the Christmas Bird Count. Auk 2005, 122, 338–346. [Google Scholar] [CrossRef]
  5. Bhattacharjee, Y. Citizen scientists supplement work of Cornell researchers. Science 2005, 308, 1402–1403. [Google Scholar] [CrossRef]
  6. Bonney, R.; Shirk, J.L.; Phillips, T.B.; Wiggins, A.; Ballard, H.L.; Miller-Rushing, A.J.; Parrish, J.K. Next steps for citizen science. Science 2014, 343, 1436–1437. [Google Scholar] [CrossRef]
  7. Borzée, A.; Baek, H.J.; Lee, C.H.; Kim, D.Y.; Song, J.Y.; Suh, J.H.; Jang, Y.W.; Min, M.S. Scientific publication of georeferenced molecular data as an adequate guide to delimit the range of Korean Hynobius salamanders through citizen science. Acta. Herpetol. 2019, 14, 27–33. [Google Scholar]
  8. Crall, A.W.; Newman, G.J.; Stohlgren, T.J.; Holfelder, K.A.; Graham, J.; Waller, D.M. Assessing citizen science data quality: An invasive species case study. Conserv. Lett. 2011, 4, 433–442. [Google Scholar] [CrossRef]
  9. Kosmala, M.; Wiggins, A.; Swanson, A.; Simmons, B. Assessing data quality in citizen science. Front. Ecol. Environ. 2016, 14, 551–560. [Google Scholar] [CrossRef] [Green Version]
  10. NIBR. Amphibians and Reptiles. In Red Data Book of Republic of Korea; Ecodesign: Incheon, Korea, 2019; Volume 2, p. 137. [Google Scholar]
  11. Borzée, A.; Ahn, J.; Kim, S.; Heo, K.; Jang, Y. Seoul, keep your paddies! implications for the conservation of hylid species. Anim. Syst. Evol. Divers. 2015, 31, 176–181. [Google Scholar] [CrossRef] [Green Version]
  12. Borzée, A.; Choi, Y.; Kim, Y.E.; Jablonski, P.G.; Jang, Y. Interspecific variation in seasonal migration and brumation behavior in two closely related species of treefrogs. Front. Ecol. Evol. 2019, 7, 55. [Google Scholar] [CrossRef] [Green Version]
  13. Park, S.G.; Ra, N.Y.; Jang, Y.S.; Woo, S.H.; Koo, K.S.; Chang, M.H. Comparison of movement distance and home range size of gold-spotted pond frog (Pelophylax chosenicus) between rice paddy and ecological park-focus on the planning alternative habitat. Ecol. Resil. Infrastruct. 2019, 6, 200–207. [Google Scholar]
  14. Shim, Y.J.; Kim, S.R.; Yoon, K.B.; Jung, J.W.; Park, S.U.; Park, Y.S. A basic research for the development of habitat suitability index model of Pelophylax chosenicus. J. Korean Soc. Environ. Eng. 2020, 23, 49–62. [Google Scholar]
  15. Do, M.S.; Son, S.J.; Choi, G.; Yoo, N.; Koo, K.S.; Nam, H.K. Anuran community patterns in the rice fields of the mid-western region of the Republic of Korea. Glob. Ecol. Conserv. 2021, 26, e01448. [Google Scholar] [CrossRef]
  16. Matheson, C.A. "iNaturalist". Ref. Rev. 2014, 28, 36–38. [Google Scholar]
  17. Boone, M.E.; Basille, M. Using iNaturalist to contribute your nature observations to science. EDIS 2019, 2019, 5. [Google Scholar] [CrossRef]
  18. Nugent, J. iNaturalist. Sci. Scope 2018, 41, 12–13. [Google Scholar] [CrossRef]
  19. Borzée, A.; Kim, K.; Heo, K.; Jablonski, P.G.; Jang, Y. Impact of land reclamation and agricultural water regime on the distribution and conservation Status of the endangered Dryophytes suweonensis. PeerJ 2017, 5, e3872. [Google Scholar] [CrossRef] [Green Version]
  20. NIE. The 5th National Natural Environment Survey Guidelines; Design glory: Seocheon, Korea, 2019; p. 281. [Google Scholar]
  21. Yoon, I.B.; Kim, J.I.; Yang, S.Y. Study on the food habits of Rana nigromaculata Hallowell and Rana plancyi chosenica Okada (Salientia; Ranidae) in Korea. Korean J. Environ. Biol. 1998, 16, 69–76. [Google Scholar]
  22. Park, S.R.; Lee, B.K.; Yang, S.Y. The call patterns and the change of calls by water temperature in Rana plancyi (Amphibia, Anura). Korean J. Ecol. 1998, 21, 269–276. [Google Scholar]
  23. Kim, E.B.; Kim, E.S.; Sung, H.C.; Lee, D.H.; Kim, G.J.; Nam, D.H. Comparison of the skeletal features of two sympatric tree frogs (Hylidae: Hyla)—Hyla japonica and Hyla suweonensis—using three-dimensional micro-computed tomography. J. Asia Pac. Biodivers. 2021, 14, 147–153. [Google Scholar] [CrossRef]
  24. Borzée, A.; Park, S.; Kim, A.; Kim, H.T.; Jang, Y. Morphometrics of two sympatric species of tree frogs in Korea: A morphological key for the critically endangered Hyla suweonensis in relation to H. japonica. Anim. Cells Syst. 2013, 17, 348–356. [Google Scholar] [CrossRef]
  25. Kuramoto, M. Mating calls of treefrogs (genus Hyla) in the Far East, with description of a new species from Korea. Copeia 1980, 1, 100–108. [Google Scholar] [CrossRef]
  26. Chae, M. Improvement direction of farmland management system according to changes in agricultural environment. Plan. Policy 2002, 110–125. [Google Scholar]
  27. Choi, W.H. Present state and future prospect of agricultural water demand and supply. KCID J. 2005, 12, 4–14. [Google Scholar]
Figure 1. The location types of the two frogs recorded by experts and citizens. (A) Pelophylax chosenicus; (B) Dryophytes suweonensis.
Figure 1. The location types of the two frogs recorded by experts and citizens. (A) Pelophylax chosenicus; (B) Dryophytes suweonensis.
Diversity 14 00316 g001
Figure 2. Positional errors on the location of two species. The location of (A) Pelophylax chosenicus and (B) Dryophytes suweonensis were recorded at sea and in buildings, respectively.
Figure 2. Positional errors on the location of two species. The location of (A) Pelophylax chosenicus and (B) Dryophytes suweonensis were recorded at sea and in buildings, respectively.
Diversity 14 00316 g002
Figure 3. Habitat change of Dryophytes suweonensis (red dot) with time in Google map. (A) 2004; (B) 2019.
Figure 3. Habitat change of Dryophytes suweonensis (red dot) with time in Google map. (A) 2004; (B) 2019.
Diversity 14 00316 g003
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Koo, K.-S.; Oh, J.-M.; Park, S.-J.; Im, J.-Y. Accessing the Accuracy of Citizen Science Data Based on iNaturalist Data. Diversity 2022, 14, 316. https://doi.org/10.3390/d14050316

AMA Style

Koo K-S, Oh J-M, Park S-J, Im J-Y. Accessing the Accuracy of Citizen Science Data Based on iNaturalist Data. Diversity. 2022; 14(5):316. https://doi.org/10.3390/d14050316

Chicago/Turabian Style

Koo, Kyo-Soung, Jeong-Min Oh, Soo-Jeong Park, and Jong-Yoon Im. 2022. "Accessing the Accuracy of Citizen Science Data Based on iNaturalist Data" Diversity 14, no. 5: 316. https://doi.org/10.3390/d14050316

APA Style

Koo, K. -S., Oh, J. -M., Park, S. -J., & Im, J. -Y. (2022). Accessing the Accuracy of Citizen Science Data Based on iNaturalist Data. Diversity, 14(5), 316. https://doi.org/10.3390/d14050316

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop