1. Introduction
The interaction of prehistoric hunter-gatherer societies with their environment is a key research area in archeology. In this context, the spatial behavior of such societies is very difficult to investigate. Site catchment analysis (SCA) is a classic method to gain knowledge about relationships between the inhabitants of archaeological sites and their environment [
1]. The key idea of SCA is to determine the area around a site that was used to gather resources in order to investigate the mobility of and any potential economic resources available to the prehistoric inhabitants [
1,
2]. In modern archaeological and geographical sciences, the analysis of spatial interactions is usually conducted using GIS software, combined with modeling approaches [
3,
4,
5]. The umbrella term for the modeling approach applied to enable GIS-based site catchment modeling is cost distance or least-cost modeling, which is realized with walking speed models in this study.
A cost distance model describes the relationship between travel distances and their associated costs, which are generic and can be of different types. In this study, slope-based walking-speed is applied as the cost value, calculated with a hiking function to derive the walking time over distance. The most important parameters that define the output of cost distance models are the modeling algorithms, the applied walking-speed model, the input data and their processing.
Important input data for GIS-based cost distance modeling are environmental datasets, such as digital elevation models (DEMs), and other data of landscape elements, such as vegetation or stream networks. Good examples for GIS-based SCA are given by Uthmeier et al. [
6], Marín-Arroyo [
7], Jobe and White [
8], Surface-Evans [
9], Gravel-Miguel [
10] or Henry et al. [
11]. From these studies, it is obvious that elevation data are of key importance for modeling site catchment areas of prehistoric inhabitants.
In a GIS software environment, elevation data are usually represented as a DEM, which is a digital representation of the Earth’s surface [
12]. The elevation data utilized in this study are raster-based, where each raster cell stores a height value. A wide variety of such DEMs exist, and each has different characteristics due to data acquisition and processing techniques, which result in different spatial resolutions and accuracies of the elevation values [
13,
14]. The data used in this study are typically post-processed to remove buildings and, in some cases, vegetation. This representation of the bare ground surface is also often referred to as a digital terrain model (DTM) [
15]. The CDM approach applied in this study uses slope rasters derived from such DEMs, on which the cell size of a raster DEM has a significant influence [
16].
Insufficient evaluation of the input data causes major restrictions in the adequate implementation of walking speed modeling, an issue often neglected in former archaeological CDM studies. Therefore, apart from the presentation and discussion of the CDM approach with additional cost components, the major objective for this study is to investigate the influence that different DEM datasets have on the site catchment modeling results discussed above. For this investigation, three main objectives have to be considered: (i) examination of studies and reports that evaluate the quality of publicly-available DEM datasets (ASTER GDEM [
17], SRTM [
18,
19,
20], EU-DEM [
21], official DEMs from the Spanish National Geographic Institute and the Regional Government of Andalusia [
22,
23]); (ii) evaluation of elevation profiles, error rasters and slope values derived from the DEMs; (iii) evaluation of the influence of the different DEMs and environmental datasets on the site catchment models.
2. On Site Catchment Analysis, Cost Distance Modeling and Slope Estimation
The term site catchment analysis originates from an archaeological context and was proposed by Vita-Finzi and Higgs [
1] in 1970. They describe a method to investigate and understand the relationship between human settlements and their local environments. A site’s catchment is defined as the area that is regularly exploited by its inhabitants, comparable to the classical meaning of a river catchment or a watershed in hydrology, where the term was borrowed from [
24]. As a site catchment is not known at the beginning of a study, different methods were designed to determine it. Assuming that inhabitants of a site covered the distances to exploit the resources of an area by walking, the size of the area can be defined by the time it takes to cover them. Therefore, the early approaches were quite simple and based on drawing circles of 5 km or 10 km in radius around a site (“fixed-distance radii”, 20 km in this work). This was followed by a more elaborated practice of interpolating the distances of actual walked transects (for instance: north-south and east-west) in the given area, to incorporate the effect that the actual topography has on the walking time [
24,
25,
26].
With GIS software becoming more popular and advanced, it was obvious to develop a modeling approach that allows the consideration of topographic data to try to overcome these limitations. Hunt [
27] summarized the benefits of the application of GIS software in catchment analysis, while Wheatley and Gillings [
4] present plenty of GIS-based methods that can be applied to several archaeological approaches. One of them is site catchment analysis, and the fundamental approach that allows us to model site catchments is cost distance modeling. The cost distance models are used to derive isochrones (lines of equal time) or raster surfaces that represent a hypothetical site catchment as applied before by Tripcevich [
28] and Marín-Arroyo [
7].
The application of cost distance analysis in Prehistory is based on the necessity for hunter-gatherers to alter their behavior and routes in order to optimize their energy expenditure (least cost assumption) [
9,
29]. The results of the cost distance analysis of an archaeological site can be compared with the archaeological data from which information about the mobility and behavior of hunter-gatherers can be inferred [
30,
31]. For this case study, we use lithic raw materials, but there are also other elements that can help archaeologists to interpret the mobility of human groups, such as exotic raw materials (obsidian, amber, etc.) or mobile art.
Originally unrelated to these archaeological questions, William Naismith proposed a rule in 1892 that related human walking speed to the slope of the terrain [
32], which was refined later by Aitken [
33] and Langmuir [
34]. This formula is implemented in a cost distance tool in GRASS GIS [
35], but this work applies a different formula that was established later. In 1993, the geographer Waldo Tobler proposed the Tobler hiking function [
36] to estimate slope-derived walking speeds that has been applied in this study, as well.
Jobe and White [
8] built a cost distance model for human accessibility that is based on energetic expenditure while hiking through a given terrain. It incorporates different landscape features, such as vegetation, trails, streams and slope. They concluded that slope, followed by vegetation, is the dominant contributor to the mean accessibility of the model. Ullah [
26] used the r.walk module of GRASS GIS to conduct the slope-dependent cost distance analyses, but additional costs were not taken into account. Howey [
37] selected vegetation cover, waterways and slope (as a relative cost value) as criteria critical to prehistoric movement. She also suggested that although many studies acknowledged the importance of incorporating multiple criteria in cost surface models, they usually only included slope as a variable. Surface-Evans [
9] incorporated slope (as a relative cost value and with Tobler’s hiking function) and rivers, both as obstacle and transportation routes in her movement models.
Verhagen [
38] identified cost surfaces, least cost paths (LCP) and network analysis as useful tools to identify places that are more connected or isolated, in order to draw conclusions about the suitability of a landscape for settlement or other activities. Surface-Evans [
9] attested that least cost analysis (which includes cost distance analysis) has potential in modeling idealized expectations for regional patterns of land use, for example to test behavioral hypotheses. Ullah [
26] suggested that SCA should be viewed as a type of experimental archeology, since SCA does not produce “the site’s catchment”, but rather returns a range of site catchment scenarios that are plausible for the available data that were used in the case.
In the approach presented in this study, the slope, which is derived from the DEM, is the most important input variable. Apart from varying slope estimations based on the applied algorithm, as shown by Jones [
39], Warren et al. [
40] and Herzog and Posluschny [
41], the DEM cell size is an important factor. Hasan et al. [
42] investigated the relationship between DEM resolution and slope, drainage area and topographic wetness index variation. They concluded that the estimates of slope differ significantly with the resolution of the DEM for the investigated peat land area in northern Sweden. The works of Zhang and Montgomery [
43], Sørensen and Seibert [
44], Vaze et al. [
45], Kantner [
46] and Grohmann [
47] show that the overall slope values increase with the horizontal resolution (reduced cell size) in their respective study areas. This has important implications for the modeling results produced in this work.
6. Discussion
In the context of GIS-based SCA in archeology, a slope-based cost distance modeling (CDM) approach is presented, and the influence of different DEMs on the results of CDM approach is investigated.
In this study, the walking speed model (also: hiking- or cost function) is the basis for the CDM. Although it is used frequently, the applied hiking function from Waldo Tobler is not free from issues. Herzog [
102] argues that Tobler refers to data published by Imhof [
103], but that his estimation does not fit Imhof’s data very well. Kondo and Seino [
104] tried to evaluate and improve the formula on the basis of an ancient route in Japan and a GPS-aided walking experiment. They assessed that their measured walking speeds largely fit the Tobler curve in the slope range from −0.20–0.20 (−18
–18
), but that their measurements deviate below and above those slope values. They attempted to adjust the function to their measurements, but the problem in their study was that it was based on a sample size of only two persons. Consequently, their adjusted function has to be evaluated before its application. Hence, in the future, it would be very interesting and surely useful to derive a slope-based cost function that includes a measure of time, from GPS measurements, by a larger group of test subjects.
Further, it is important to take into account that the calculation of slope alone can differ among several geographical information system applications. As described in
Section 4.3, the slope function in ArcGIS uses Horn’s method [
16,
39,
99,
100] to determine the slope values of the raster cells in the DEM. This is similar in GRASS GIS. SAGA GIS, on the other hand, offers additional different slope algorithms. It is important to consider this, as the application of different slope algorithms can cause different modeling results by producing different slope values [
40,
41]. Horn’s method performed well in a comparative study by Jones [
39]. In this study, only ArcGIS’s slope tool was applied, so that the results are comparable. Another problem with the method is that only positive (uphill) slope values are considered. The slightly faster downhill walking, included in Tobler’s hiking function, was not taken into account, so the modeling is isotropic. The same is true for potential walking paths along an elevation contour line, where the cost is also direction dependent [
97] because the applied cost spreading algorithm of the cost distance tool does not support anisotropy. ArcGIS’s path distance tool supports anisotropic cost modeling via a vertical factor table, but it was not applied here. We would argue that the importance of anisotropy is negligible for site catchment modeling, which is the main point of this work, as a presumed outbound trip and the way back would compensate substantially for the difference of up- or down-hill hiking. While this study investigates a specific question with respect to an already established method, the implementation of a different walking or cost spreading model, possibly using another data model, could address these restrictions in future research. When considering the cost distance tools, it is possible to transfer the method to other GIS software, such as GRASS GIS or SAGA, which provide respective cost spreading algorithms. GRASS’s cost distance tools allow more movement directions for the value accumulation (see
Figure 2), 17 vs. 9 in a 3 × 3 grid (Knight’s move), which could lead to slightly more accurate modeling results. The latter was not in the focus of this study and is mentioned only for consideration in future work. The above-mentioned differences are more important to consider in least cost path modeling than in CDM. Furthermore, the presented evaluation results concerning the influence of different DEMs and their resolution are valid, regardless of whether the applied cost distance algorithm supports isotropy or not.
The initial evaluation of the DEM quality components (
Section 3.1) and their specific characteristics (
Section 5.1) shows that the DEMs exhibit substantial differences. Apart from the absolute vertical accuracy, noise and other error characteristics of the elevation data, the most noticeable difference is the raster cell size, which is in the range of 5 m to about 90 m. The results of the evaluation show that the choice of the DEM is quite important for various reasons. The studies on the relationship of slope to DEM cell size, covered in
Section 2, reveal that increasing cell size leads to decreased slope values. For slope-based cost distance modeling, this indicates that the calculated movement speeds through the raster cells should increase. This is confirmed by the results of the statistical DEM evaluation, where one source DEM was resampled to a range of cell sizes to perform the site catchment modeling (see
Table 7,
Table 8 and
Table A1 in the
Appendix A). The modeled area sizes of the catchments correlate with the horizontal resolution and the mean slope of the applied DEM. However, the published accuracy information (
Table 1), the qualitative and quantitative observations made in
Section 5.1, the correlations of the slope rasters (
Table 6) and the statistical evaluation (
Section 5.2) indicate that the horizontal resolution is not the only important factor. Notable examples are the comparable results of EU-DEM and SRTM-3 (mean area size; mean slope of EU-DEM: 434,525 km
; 8.77
/SRTM-3: 437,923 km
; 8.02
), although the EU-DEM raster data have a much smaller cell size. Further, SRTM-1 leads to slightly larger catchment areas than ASTER GDEM V2 (mean area size; mean slope of SRTM-1: 375,283 km
; 9.85
/ASTER GDEM V2: 372,021 km
; 10.73
), with remarkable negative or positive differences depending on the site location (see
Table 8 and
Table A1), which is not apparent in the results based on the resampled data. As their cell size difference is minimal, these variations must be attributed to other differences in the DEM characteristics, such as lower absolute vertical accuracy (see
Table 1), varying amounts of noise or other errors caused by vegetation and tree canopy, slope or elevation of the landscape (see
Section 3.1) and the specific characteristics of the data acquisition method, respectively, as well as residuals of removed anthropogenic objects present in the data (see
Section 3.1 and
Figure 4 and
Figure 5a–c). These differences are reflected in different slope values, which are observable in the correlation of the slope rasters relative to the 5-m reference DEM (
Table 6).
Archaeological CDM aims to provide a better estimation of a site’s catchment than a simple radius, which was the traditional approach in site catchment studies [
9,
24]. Since the DEM is the most crucial input data, the first thought might be that the higher horizontal resolution and vertical accuracy leads to better results. As seen in the evaluation results and the modeled site catchments (see
Figure 6), the main difference produced by higher resolution DEMs is a change in size of the modeled site catchments. This change is, in theory, systematic and, thus, predictable. Smaller differences are noticeable regarding the overall shape of the catchment. In detail, the isochrones are more jagged because the influence of small-scale landscape features is stronger than in the lower resolution DEMs, which also leads to comparably slower accumulated walking speeds in hilly terrain (see
Figure 6). It is likely that the higher resolution DEMs enable more realistic results, when considering the observation that average slope values decrease with lower cell size. The applied high resolution DEMs include one specific problem. Whereas vegetation and buildings are filtered out of the data, the 5-m and 10-m DEMs still contain many residuals of anthropogenic features in the landscape, such as bridges, trenches or channels, which definitely were not part of the topography at the time frames under investigation. In this regard, the SRTM DEMs and ASTER GDEM V2 perform better in our sample areas. Clear advantages of the 5-m DEM are the high vertical and horizontal accuracy and fewer noise artifacts, which should lead to more consistent modeling results, as the height error in ASTER GDEM V2 or the SRTM DEMs also varies with location (see
Section 3.1). An example of where the use of high resolution DEMs should be more appropriate is a site within a narrow canyon. Here, a DEM at a horizontal resolution of 30 m is simply not able to reproduce such small-scale details. This is especially relevant if the effective horizontal resolution is even lower than the cell size of the data, as was assessed for ASTER GDEM V2 and SRTM-1 (see
Section 3.1).
The incorporation of additional topography and vegetation costs into the CDM was implemented. The approach of generating a raster with speed coefficients that are derived from a classified stream network or vegetation (in our case, biomes) raster works, but the current implementation has the potential for improvement for various reasons. Apart from the missing data in the coastal area, at about 60 km × 60 km, the spatial resolution of the Stage3 Biome3.5 21k data is rather low. The Stage3 Biome3.5 21k raster data have a more global effect on the size of the site catchments and do not change their shape, except when the catchment traverses a border between two different biome classes. Vegetation data like these could prove useful as a kind of off-path factor as described by Tobler [
36]. However, the validity or benefit of an off-path factor is open for discussion, as well. It is very likely that the inhabitants of a site used established paths for resource procurement or transportation of food or resources between sites, because of the tendency to conserve energy. In this case, an off-path factor would make less sense. As there is no alternative high resolution vegetation data for the time period in question known to the authors, this is a clear task for future work. Ecological niche modeling approaches or further efforts in paleovegetation modeling could fill the gap by providing high resolution land cover data. The coefficients for both vegetation and stream networks are based on expert knowledge [
75,
76] or actual energy cost measurements conducted by [
8,
74,
105]. These were mapped from the vegetation or land cover classes to the available biome classes, which works well, but the coefficients might be considered for re-evaluation in future work. Furthermore, the incorporation of stream networks itself is not optimal at this point; a key problem is that the stream width is directly connected to the horizontal resolution of the DEM. The cost coefficients take this into account to a certain extent, but the effect on the results is very small. This is a complex problem investigated by Dean et al. [
106], and the modeling approach could be much improved in this area. Further, it has to be mentioned that no other water bodies were included in the analyses, which leads to another fundamental issue. The applied data (apart from the Stage3 Biome3.5 21k dataset) are recent and cannot take into account the possible changes in geomorphology or hydrology since the Last Glacial Maximum. Incorporation of lakes into the analysis would involve filtering out anthropogenic reservoirs. Furthermore, lakes and reservoirs are normally part of the surface reproduced in the DEMs, so that the bottom of a lake is not available in the data. The same applies to the bathymetry on the coastal areas, where the sea level has changed since the LGM. Despite the above-mentioned issues, we could show that it is possible to include such data into the current modeling approach. Therefore, if the paleo-modeling community provides improved high resolution environmental data, direct improvement of the modeling results would be expected.
This section aims to discuss the results that were worked out in this study from an archaeological point of view. In order to be suitable for a study of this type, the sample of archaeological sites must have the same chronological framework (
Table 2). Solutrean sites have a similar chronology, enhancing the possibilities of a good analysis and comparison of the sites. In
Section 3.3, it was explained that, of the five Solutrean sites in our sample, only the sites Nerja and Bajondillo are suitable for an archaeological analysis, because only these exhibit a good stratigraphical record. Further, it should be taken into account that, concerning the catchment modeling of the sample of sites, the Stage3 Biome 21k dataset does not contain data in the areas of Nerja and Bajondillo. Therefore, the additional cost of vegetation is not considered in either site, which currently exhibit a similar environment. Bajondillo is located 200 m and Nerja less than 1 km from the coast line, but during the time period of our analysis (LGM), the coast was about 5–8 km distant from the sites. Raw material data are available for both sites, although in the case of Nerja, the data are partially problematic, as the analysis of the raw materials also includes Gravettian levels, the sources of which are not very well identified. In the case of Bajondillo, as we do not have archaeological information about the faunal remains, this part of the record has to be discarded, as well. The rest of the sites are problematic for an archaeological analysis, as well, as the levels are reworked or the archaeological record is limited to surface evidence.
In any future investigation, the following conditions should be applicable for any site sample: The sample sites need to be contemporaneous, with good chronological data and a good stratigraphical context. Sites should also contain good raw material data (lithic remains and sources of raw material) and, if possible, good faunal data in order to be able to deduce the economic behavior and mobility of hunter-gatherer groups.
7. Conclusions
In the context of archaeological site catchment analysis, the focus of this work lies on the evaluation of the influence that different DEMs exert on the site catchment modeling results for prehistoric sites. It was demonstrated that the size of the modeled slope-derived site catchments correlates negatively with the resolution of the input DEM. Further, the quality and characteristics of a DEM, such as accuracy, noise and the reproduction of residuals of anthropogenic landscape features, are important factors for archaeological cost distance modeling, as well. The choice of the DEM is shown to be of paramount importance to the research, as the DEMs investigated in this work showed much variation in these characteristics. The 5-m and the 10-m DEM, while delivering the highest accuracy, contain the largest amount of such residuals. The inclusion of these contemporary residuals must be considered in this context, as the modeling results should reflect only prehistoric conditions. Nevertheless, the higher resolution is clearly valuable if a site is situated in a steep canyon, for example, which simply cannot be reproduced at a 30-m cell size, and the architectural features are not an issue in areas that are still unaffected by anthropogenic constructions. Further, the high accuracy should lead to more consistent and predictable results. Until 2015, the (presumed) advantage of the ASTER GDEM V2 was the higher horizontal resolution compared to the SRTM-3 DEM. As the SRTM-1 DEM was made available over the course of 2015 and the dataset is of higher accuracy and contains fewer artifacts in our research area, the SRTM-1 DEM is also very well suited for this purpose; especially if no official DEM with higher resolution and accuracy is available for the area of interest. The EU-DEM however, does not offer any advantages compared to the other DEMs.
Overall, the GIS-based cost distance modeling approach, which was presented in detail, works well, and these results are expected to provide a better approximation of the actual conditions than a simple radius would do. Besides issues with the applied paleoenvironmental data (and their conversion into cost coefficients, which have to be evaluated in future work to ensure comparable results in order to diminish the limitations of this part of the method), it was shown that it is possible to include additional factors, like vegetation and river networks, into the modeling approach.
If, as discussed at the end of the last section, the conditions concerning the archaeological data apply, the cost distance model that derives slope-based site catchments enables useful qualitative and quantitative assertions for archaeological sites regarding their environment and their interconnection. With the presented CDM approach, it is possible to classify and characterize archaeological sites using the topographic features, landforms and faunal species found in the modeled catchment areas.