1. Introduction
Accurate and consistent cropland information is crucial to answer the issues of global food security to make future policy, investment and logistical decisions [
1,
2,
3,
4,
5,
6,
7]. Global cropland mapping provides baseline cropland information to accurately assess the drivers and implications of cropland dynamics both at regional and global scale [
8,
9,
10]. To predict and respond to food insecurity, global cropland products are readily available from coarse and medium spatial resolution earth observation data. Therefore, remote sensing has been recognized as an extremely effective, economical and feasible approach to create cropland thematic maps over a range of spatial and temporal scales [
11,
12,
13].
Cropland maps such as Global Map of Irrigation Areas (GMIA), Global Map of Rain-fed Areas (GMRCA) [
14], Global Monthly Irrigated and Rain-fed Crop Areas (MIRCA2000) [
15], Global Rain-fed, Irrigated and Paddy Croplands (GRIPC) [
16] and MODIS-Cropland [
17] derived from coarse-resolution satellite data are currently some of the main sources of cropland information on a global scale. However, these types of products either have insufficient accuracy or their resolution is too coarse for use in other than global applications [
10]. More recently, with advances in remotely sensed imagery and classification algorithms implemented on cloud computing platforms such as Google Earth Engine, cropland products are available at higher spatial resolutions. For example, global cropland products were generated at 250 m and 30 m spatial resolutions by the NASA MEaSURES (Making Earth System Data Records for Use in Research Environments) GFSAD (Global Food Security Data Analysis) project [
18,
19,
20,
21].
Previously, accuracy assessments performed on most global cropland extent maps were conducted to produce a single global accuracy measure (i.e., overall accuracy) without regard to continental or regional differences [
22,
23]. These measures could then be used to make a statement about the overall accuracy of the global map, but not about specific continents or regions [
24]. The accuracy of a specific continent or region could only be determined if an assessment was done for that area. Insufficient availability of reference data in most regions of the world along with significant variations in agricultural landscape patterns offers unique challenges to conduct a more detailed accuracy assessment of any global mapping product. The reference data required to perform the assessment over large area cropland maps are not uniformly available for all the continents. For instance, some of the continents have extensive reference data available for assessment (such as US and Canada) while other continents (such as Africa) have very little to none. Additionally, mapping in some continents exhibits larger errors than others simply depending on the complexity of the agriculture landscapes [
25,
26]. For example, it has been reported that an accurate map is difficult to achieve in developing countries with small holder agricultural landscape by the International Food Policy Research Institute (IFPRI) [
27]. Therefore, most of the global assessments have not attempted to provide a more regionalized measure of accuracy encompassing continents/regions or different agriculture landscapes [
28].
Clearly, a single accuracy estimate will not provide a holistic view of the ability to map variation within the agriculture landscapes of different continents. However, these continental variations could be reported by assessing the map accuracy in response to different issues observed in the cropland maps of different continents [
22,
29,
30,
31]. Therefore, a continent-based assessment strategy will provide more intensive evaluation of the cropland maps in agriculture landscapes for different continents. This kind of intense and efficient assessment strategy for different continents will also help to understand the efficacy, quality and variations of large area cropland maps [
32]. Therefore, the question is how to implement such a continent-based assessment strategy effectively for each continent while also considering the different kinds of issues and constraints related to both reference data availability and complexity of agriculture landscapes.
While individual measures of accuracy are well established in literature (e.g., [
30,
31,
33]), considerable ambiguity remains about the implementation and interpretation of accuracy assessment for large areas. The most widely accepted approach to perform an accuracy assessment is through the use of an error matrix [
31]. The error matrix is a cross tabulation of the class labels predicted by the image classification against that observed from the reference dataset. The key issue in generating a valid error matrix is the collection of sufficient and appropriate reference data. The collection of such data must be conducted using an appropriate sampling scheme with sufficient samples in consideration of the complexities of the area being mapped [
24].
In order to conduct a detailed continent-based assessment of cropland extent maps generated in the GFSAD project, it was important to consider how the mapping was performed in each continent. The cropland extent maps for each continent were created by dividing the area into homogeneous regions (i.e., stratification) rather than just producing a single map of the entire continent. As a result, the accuracy assessment was also performed within these homogeneous areas using traditional assessment methods with modified sampling designs for collecting reference data depending on the agriculture landscapes in each continent [
23,
30]. Such modified sampling designs for collecting reference data ensured an optimum sampling approach considering different agriculture landscapes in each continent. Therefore, continent-based accuracy assessment of the cropland maps generated in GFSAD project was conducted by continent in response to different issues and constraints observed in the cropland extent maps.
The goal of this paper is to provide specific approaches and recommendations for modifying existing accuracy assessment strategies and methodologies to validate global cropland extent maps considering the issues and constraints unique to each continent. Meaningful and statistically valid assessment results demonstrate that these methods and approaches contribute a better understanding of global cropland distribution by continent. This work is specifically focused on dealing with unique issues encountered when assessing cropland extent for North America (confined to the USA), Africa and Australia. These three continents were selected as a representative of different cropping patterns with variable size agriculture fields and availability of reference data. Lessons learned from this work can be further extended to other continents to provide appropriate methods of accuracy assessment where different scenarios of reference data availability and agriculture landscapes exist. The approach of stratifying the continents based on Agro-Ecological Zones (AEZ’s) for the US and Africa provided rigorous and valid accuracy results [
29]. The buffer-based stratification approach used in Australia also provides an alternative methodology for when crops are clustered only in certain areas of the continent and are not appropriately represented by AEZ’s.
4. Discussion
The accuracy assessment of large area cropland extent maps was performed in response to specific issues, constraints of concern, limitations and a few advantages observed in different continents throughout this project. The assessment strategy was adapted for the selected continent in order to determine measures in homogeneous regions and provide more information for the large area cropland maps. It is important to note how the assessment strategies were modified from one continent to another globally in response to the cropping patterns in different agriculture landscapes. The heterogeneity in the agriculture landscapes of different continents was minimized by dividing each continent into homogeneous Agro-Ecological Zones (i.e., AEZ’s), which were then used to perform the accuracy assessment. Indeed, there is no single sampling design for collecting global reference data that can serve as a universally appropriate everywhere [
28]. Therefore, modified sampling designs were employed in the AEZ-based assessment strategies for the different continents. Where the use of an AEZ-based stratification failed to produce more homogeneous regions (e.g., Australia), a buffering approach was selected instead. The following is a discussion of the results for each of the selected continents.
4.1. United States
The US has dynamic cropping patterns where dominant crop types are well distributed in large homogeneous agriculture field sizes and rare crop types are scattered across heterogeneous agricultural landscapes. Therefore, these variations in agriculture landscape need to be considered while performing an assessment of the cropland maps. The issues and constraints of cropland diversity were incorporated by dividing the US into homogeneous regions using a stratification method based on AEZ’s. The accuracy results for homogeneous regions of a continent can be more meaningful for the users to implement various region-based cropland models for planning and decision-making.
There is a great advantage in the US of having an extensive and easily accessible reference data set (i.e., USDA CDL). Due to the availability of such a high-quality reference data (i.e., accuracy ranges from 85–95%) [
36], it was easy to perform an accuracy assessment of cropland maps in the US as compared to other continents. AEZ-based assessment strategy was employed with only a few modifications in the sampling design to choose an appropriate sample scheme and size. The task of achieving an appropriate sample size was determined by a sample simulation analysis that provided insight into how many samples were required to perform a continent specific accuracy assessment. Therefore, AEZ-based assessment strategy using high-quality reference data with some modifications in the sampling design was employed to report the accuracy measures in all the AEZ’s.
The accuracy assessment of cropland extent map in the US provided both accuracy measures in the form of error matrices and spatial distribution of agreement and disagreement in the form of a difference image. The compiled accuracy estimates generated for all the AEZ’s showed high overall accuracies for the crop and no-crop classes with no commission errors in the cropland areas (
Table 1). These results were also confirmed with a difference image that showed spatial agreement and disagreement between the reference data and the map (
Figure 6). The cropland class on the map has no-commission error and only 2% omission error. Overall, there were no complications in performing the accuracy assessment of the cropland maps of the US. The results show that performing an accuracy assessment of the cropland extent map for a continent with an extensive reference data set can easily be done, but that care still must be taken to determine homogeneous cropping regions which result in more meaningful, representative and valid mapping products.
4.2. Africa
In Africa, different issues and constraints were observed when compared to the other selected continents due to: (1) the heterogeneous and scattered cropland distribution across the AEZ’s; and (2) the lack of effective and valid reference data. In Africa, AEZ’s were quite diverse from each other with a few zones having very sparse cropland distribution as compared to others where the cropland areas were more uniformly distributed. These issues and concerns were considered and planned for well before assessing the cropland maps using the AEZ-based accuracy assessment strategy with a modified sampling design. The AEZ-based strategy was different from the one employed in the US as it was specific to each AEZ according to each zone’s cropping pattern variability. In response to these cropping patterns and the lack of availability of valid reference data, the traditional method of accuracy assessment was conducted using a modified sampling design.
To perform the accuracy assessment of the cropland extent map of Africa, the reference data had to be collected from interpretation of VHRI for the year 2014 because no other data existed. Before collecting this reference data, it was necessary to determine where and how many samples were required. Another sampling simulation was performed to determine appropriate sample sizes for the various AEZs. Because of the large variability in the cropland distribution of the AEZ’s in Africa, zones with low crop diversity (i.e., AEZ 1 and 2) were sampled with a minimum number of samples (50) as compared to the other zones (250). Such modified sampling in Africa demonstrates the power of being able to selectively devote effort and time in collecting reference data based on the variability in the cropland distribution.
The results in
Table 3 show that there is a high overall accuracy in all the zones. It is very important to note the producer’s and user’s accuracy of the crop class in each AEZ. In Zone 5, 6 and 7, the producer’s and user’s accuracy of the crop class are low due to low crop intensity and spatial structure of cropland areas. However, Zone 1 and 2 also has low crop proportion but the spatial structure of crop patches is different in these zones as compared to zone 5, 6 and 7. Therefore, different users’ and producers’ accuracies were observed in each zone due to spatial fragmentation of cropland areas in specific zones. These accuracies provide a more detailed view for each mapped class beyond just the overall accuracy and are indicative of the AEZ-based assessment strategy. Specific to Africa, it is very important to observe the cropping pattern in each AEZ and how much effort is required to generate the reference data necessary to perform the AEZ-based assessment strategy. These results show that the AEZ’s with low crop proportions do not require large sampling efforts as these zones have smaller geographic extent and less cropland area to assess. It is, therefore, reasonable to assume that if the sample size was increased for these zones; neither the number of crop samples nor the map accuracy would increase. Throughout this process of assessing the cropland map of Africa, the AEZ-based assessment strategy helped to provide a more detailed view of the accuracies of homogeneous regions within the continent.
4.3. Australia
The accuracy assessment process to validate the cropland extent map of Australia was modified differently than the process used in the US and Africa. In Australia, the cropland distribution is concentrated only along a narrow belt towards the edges of the continent. There is a very low chance of cropland areas towards the center region of Australia because of the very dry conditions there. This concentrated cropping pattern was considered in the assessment strategy by choosing a reduced region around these cropland patches. AEZ’s in Australia do not exhibit much diversity due to low crop variability in the continent and therefore, were not appropriate to stratify the continent. Rather than an AEZ-based assessment strategy, a buffering approach was used to divide the continent into homogeneous regions to perform a valid assessment of the cropland map of Australia.
In addition to a different method of stratification, two different sources of reference data were used in Australia. The first were ground collected samples and the second were collected from interpretation of VHRI. As a result of the ground-collected reference data, a new set of issues needed to be considered. This ground data was collected during a field campaign that emphasized identifying cropland areas. As a result, very little non-cropland samples were recorded. Therefore, creating an error matrix from the reference data resulted in an error matrix in which there were many cropland samples and few non-cropland samples (
Table 5). However, given that only approximately 12% of Australia is cropland, this error matrix was highly imbalanced and not representative of the map accuracy. The non-cropland samples were augmented appropriately using interpretation of VHRI to obtain sufficient samples to generate a balanced error matrix indicative of the actual cropland/non-cropland proportion (
Table 6).
While the balanced error matrix generated from the ground reference data augmented with interpretation of VHRI demonstrates good overall cropland mapping, it is not representative of the entire continent since the ground data were collected all in the southern region. To solve this problem, a stratification approach using buffer analysis was adopted. This method provides a buffer around the cropland areas of Australia at two distances and eliminates sample collection from the center of the continent where there was very low chance of finding crops.
The error matrices generated for the buffer zones (
Table 7 and
Table 8) have high accuracies, but are lower than the error matrix in
Table 6. However, these error matrices from the buffering approach should be viewed as more representative and meaningful than the matrix that used reference data from only part of the continent. As demonstrated in Australia, the entire continent might not be considered as an appropriate sampling area for assessing the accuracy of the cropland maps. In such places, therefore, the sampling area needs to be modified to accommodate sparse and concentrated cropping pattern to provide meaningful and representative accuracy results.
6. Conclusions
This paper presents modified accuracy assessment strategies used to assess the accuracy of large area cropland extent maps in response to different issues such as variations of cropping pattern and reference data availability in different continents. Considering and addressing these issues, the modified assessment strategies helped to understand the efficacy and quality of cropland extent maps for different agricultural landscapes to implement economic planning and policymaking. The information derived from large area cropland maps with agricultural landscapes of different continents can be enriched, improved and analyzed with modified assessment strategies. Such modified assessment strategies promise to achieve more meaningful, representative and applicable mapping products for each continent. Therefore, the need for a continent-specific assessment strategy developed by modifying the sampling design for collecting reference data and computing accuracy measures was demonstrated to be valuable. However, different sampling methods can be employed and compared in the future to analyze the accuracy results for different cropping scenarios.
A modified assessment strategy was employed to assess the accuracy of the cropland extent maps of three selected continents developed as a part of GFSAD project. The variability of cropping pattern in the agricultural landscapes and reference data availability were considered and addressed to provide meaningful and valid accuracy results for mapping products within these selected continents. The stratification approach based on AEZ’s or buffer zones used to divide the continent into homogeneous cropping regions: (1) minimized the heterogeneity of different cropping patterns and (2) helped to rationalize the validation efforts for different continents. Finally, the sampling scheme and size were modified for the homogeneous regions using a sample analysis approach based on the variations of cropping pattern within the continent.
In summary, continent-specific modified assessments performed for three selected continents demonstrate that the accuracy assessment can be easily done for a continent such as the US with extensive availability of a reference dataset while more modifications were needed in the sampling scheme for the continents with little to no reference datasets. The result of the modified sampling performed in the AEZ’s of Africa show that the effort and time in collecting reference data can be selectively devoted based on the variability in the cropland distribution. Finally, a modified sampling was employed in the buffer zones of Australia using two different sources of reference data. An unbalanced number of ground samples collected during a field campaign that emphasized identifying cropland areas were augmented and balanced to be indicative of the crop/no crop area proportion of the map to generate a balanced and valid error matrix for Australia. The analysis performed with this modified strategy shows that the entire continent might not be considered as an appropriate sampling area for assessing the cropland maps due to little chance of cropland in center of Australia because of extremely dry conditions.