Next Article in Journal
Assessing the Innovation of Mobile Pedagogy from the Teacher’s Perspective
Previous Article in Journal
Sustainability Assessment of Different Extra Virgin Olive Oil Extraction Methods through a Life Cycle Thinking Approach: Challenges and Opportunities in the Elaio-Technical Sector
 
 
Article
Peer-Review Record

A Prediction Study on Archaeological Sites Based on Geographical Variables and Logistic Regression—A Case Study of the Neolithic Era and the Bronze Age of Xiangyang

Sustainability 2022, 14(23), 15675; https://doi.org/10.3390/su142315675
by Linzhi Li 1,2, Yujie Li 2,*, Xingyu Chen 1 and Deliang Sun 1
Reviewer 1:
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Sustainability 2022, 14(23), 15675; https://doi.org/10.3390/su142315675
Submission received: 5 October 2022 / Revised: 8 November 2022 / Accepted: 23 November 2022 / Published: 25 November 2022

Round 1

Reviewer 1 Report (Previous Reviewer 1)

I suggest some minor revisions. The new version is notable and the new integrations are improving the scientific value of the work.

Comments for author File: Comments.pdf

Author Response

Please find the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report (Previous Reviewer 2)

Summary:

This paper, titled “A prediction study on archaeological sites based on geographical factors and logistic regression --A case study of the Neolithic Era and the Bronze Age of Xiangyang” assess eight environmental variables for site suitability models or site prediction models (SPM) in central China. The authors focus on their analyses on Neolithic and Bronze Age sites, using either 129, 241, or 260 sites for their assessment. Ultimately, they found that people preferred living in lowland areas near rivers and favored those locations above mountainous regions.

 

The authors made substantial improvements to their manuscript, describing the sites and the environmental variables, and in their discussion. They added more sites to their study, including sites in the upland regions in the western part of their study zone, thus ensuring that their input data is not biased. I feel confident in the geographic distribution of the data. While there are not many sites in the uplands, the authors added a few that were used as positive tests in their training data. The authors also added additional background on regional archaeological sites. The figures were revised based on previous feedback and are much easier to read! I commend the authors on their extensive revisions but note that a few key items are still missing or must be improved for this paper to move forward with potential publication.

 

First, still, not all references are cited. While improvements were made to the works cited, there are still data sources that are not cited, meaning the reader cannot find the original data that the authors used in their analysis. For example, of the Chinese Archaeological Yearbooks, only 2017 Chinese Archaeological Yearbook is cited. If data was collected from Chinese Archeological Years 2007, 2012, and 2013, as mentioned on lines 111-113, those must be cited and included in the references as well.  Likewise, a citation for the Water Resource Bureau river data is needed.

 

Second, there is no clear dates listed for the Neolithic or Bronze Age. What are the date ranges for the Neolithic in this part of China? What are the date ranges for the Bronze Age in this part of the China? The authors included this information in their response letter, (pg 11) but it is not in the actual manuscript. In the overview, please provide dates for the Paleolithic, Neolithic, and Bronze Age as these can vary region to region. For example, on lines 97 and 98, “The Sujiaying Ruins site and the Xibandi Ruins site[18] can date back to the Bronze Age (DATES)”.

 

Third, the authors are not clear on if this analysis is for a specific city or landscape. Does this analysis take place in a single city? It seems like a landscape approach but then several times they state that this is within a city (e.g., line 521), but according to the maps, that would make the city more than 100 km wide. It should be clear that this analysis can be used to look at the distribution of sites (or cities!) and what makes a landscape suitable for living based on ancient populations but that it is fundamentally a landscape-based approach and not “in the city” (line 521).

 

Fourth, the abstract says 129 sites were used but then the second section (line 195) says 241 sites were used and later (line 250) the authors state they used 260 positive and negative sites in their analysis. Which is correct? Having three different inputs stated in the paper makes the reader question the analyses. Double check the number of data inputs throughout the paper.

 

Fifth, this paper still needs extensive copy-editing. Copy-editing for grammatical errors, flow, voice, and word choice are still needed throughout the paper. While some examples that I mentioned before were revised, other edits beyond what I can detail here are still needed. For example, “Xiangyang City was deemed as the research object in this paper, with eight geographical variables, such as elevation, slope, slope direction, micro-geomorphology, slope position, plan curvature, profile curvature, and proximity to water as influencing factors to randomly obtain 129 non-site points at the ratio of 1:1 between site points and non-site points based on the 129 excavated archaeological sites, and to construct a sample set of geospatial data and the archaeological SPM based on logistic regression (LR).”  This is a run-on sentence that could be broken into two or three smaller sentences for clarity and readability. Likewise, the phrase “was deemed the research object in” is awkward and could be revised to “is the focus of”. “Was deemed” should be revised throughout the paper (e.g., Line 150, Line 258). Double check that “factors” is changed to “variables” throughout the paper for consistency in verbiage.  

 

Finally, there are a few instances where it seems the revisions are not complete. References are in the middle of words or phrases of words (e.g., line 270 “ROC c[5]urve”, line 445 “Digital [13] elevation model”) and there is a sentence highlighted (line 460). Spacing is needed before or after several words and the citation (e.g., lines 460 – 462, “Diwan[14]”, “[41]employed”, “Iron Age[15]”. Likewise, in the abstract, the paper states 129 sites were used in the analysis, then states 241 sites were used (line 195), then states 260 positive sites were used (line 250). It seems the abstract was not updated to the new number of inputs (241) but I’m not sure where the 260 positive test sites came from.

 

I also note that Archaic homo sapiens are a separate species in evolutionary anthropology from Anatomically modern humans. Archaic homo sapiens, generally, had largely disappeared by 30,000 years ago (but perhaps 12,000 years ago). The Neolithic is, generally, post 10,000 BC, although recent research may be pushing this date back further and further. To avoid potential confusion, I suggest revising Archaic Homo sapiens, which implies this separate subspecies of homo sapiens, to “ancient people” or something akin. 

 

This paper offers a broad perspective on eight geographical variables that may have influenced settlement selection in central China during the Neolithic and Bronze Age. The methods provide quantitative analyses that can be compared to similar scholarship in other parts of the world, finding that certain variables were more influential to settlement selection than others. While this paper has merit, it requires additional major revisions.

 

Line-specific questions/comments:

Line 147 – “the study reclassified” – do you mean, “we reclassified”? Use the active voice for your analyses.

 

Line 149 and 151 – “8” should be spelled out to “eight”.

 

Line 191  - remove the 30m x 30m reference to reduce redundancy; that was already mentioned on Line 150

 

Line 195 – how many non-site points were generated? In the abstract is says 129 but now the sample size is 241. Were 241 non-site points generated to match the 1:1 ratio? Clarify.

 

Line 229 – 230 – check the references vs data. One reference is sitting above the text and another reference in brackets is 0.1, which I think should data and presented in parentheses as “(0.1)”. Same with line 242.

 

Line 250 – if you had 241 Neolithic and Bronze Age sites, how are there 260 positive sites used in the training data? Where did the other 19 sites come from?

 

Lines 270 – reference “[5]” is in the middle of a word. Also either explain what you mean by “special” or remove it.

 

Lines 270 – 274 – run on sentence that can be broken into two or three separate sentences for improved readability

 

Citations should not be included on the section titles (e.g., “5.1 Discussion [7]”)

 

Line 441 – what is a “different level of site”? Clarify.

 

Line 455 – why is [13], which cites Lambers right after the word “digital” in a sentence about work by Felix [40]? Check that this reference is in the right place.  

 

Line 460 – why is this highlighted?

 

Lines 526 – why is Xiangyang “the most attractive and mysterious part of archaeological culture”? This is not discussed at all in the text and either needs to be discussed or removed since it is tangential to the point of the study.

 

Figure, Table, and References comments:

Figure 1 – The revision of this figure nicely shows the study area within China. However, the writing in the small inset map is not legible – it is too small. Nor are the province names on the map of China. I suggest removing text unless it is needed or making the font larger.

 

On many figures the location of sites overlaps with the province or county names in the polygons. If the names are not needed, they can be removed as they are distracting from the point of the figure.

 

The scale bar on Figure 3a needs be to adjusted. Currently the first few values include decimal places (4.75 km) that overlap.

 

Figure 6 and 7 – revise the significant figures on the right side of the graph. Five decimal places are not needed. Also, figure 6 looks stretched.

 

Figure 8 – I don’t think the inset maps match the area of the larger map. On the larger map, shy of a dozen points are visible in the extent rectangle. However, on the inset map of that extent, only 6 sites are visible. The extent areas should match.

 

Figure 10 – What is a four-pole stream and five-grade stream? Are these the same as fourth-order stream and fifth-order stream? Revise the figure to match the wording used in the manuscript.

 

Author Response

Please find the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report (Previous Reviewer 4)

The current paper shows a substantial improvement compared to the previous version. The authors extensively reviewed and expanded the introduction (including a clearer indication of the objectives), the methodology, the results and the conclusion.

Therefore, I believe that there are only some minor corrections to do before being accepted for publication:

1. The abstract must be more concisely rewritten. The authors tended to focus on specific aspects of the paper, paying less attention in explaining the overall structure and workflow. There is an unbalanced description of the variables considered in the study, compared to the rest of the step, the introduction and the results. 

2. Although the objectives are much clearer, I encourage the authors to give them more prominence by moving them to a dedicated section immediately after section 1.

3. Table 1 has formatting problems.

4. I suggest merging the images of figure 2 into one, making two columns on one page.

5. In general, there are numerous formatting errors in the text and titles to correct.

6. English has improved, but there are still some spelling and lexical errors to be checked. I suggest a final reading by a native speaker.

7. The proposal for some future outlooks was much appreciated. However, I suggest indicating successful examples for example in the field of earth observation techniques to make readers understand better the direction they suggest to take.

Author Response

Please find the attachment.

Author Response File: Author Response.pdf

This manuscript is a resubmission of an earlier submission. The following is a list of the peer review reports and author responses from that submission.


Round 1

Reviewer 1 Report

1) It is not clear in the paper whether the sites 'found' after applying the prediction algorithm were certified by field surveys. Did the archaeologists then go into the field to prove the existence of the settlements? This changes the entire validity of the scientific proposal. If yes, this should be included, commented on and highlighted in the paper. If not, it, unfortunately, becomes merely a theoretical exercise, but in archaeology, it acquires no scientific validity.
2) It is not clear if the bibliography used covers all the regions analysed. Are the mountains lands covered by archaeologists' field work as the hill lands?
3) I have not read if areas not covered by field research are considered or excluded. If an area is not covered by surveys and field research and is considered in the algorithm, it will not enter the statistical average (of course), but it does not mean that humans did not live there at a certain time.

Comments for author File: Comments.pdf

Reviewer 2 Report

Summary:

This paper, titled “A study on the prediction of archaeological sites based on geographical factors and logistic regression --The Neolithic of Xiangyang is an example” uses 129 known neolithic sites and 8 (or 7, both are said in the paper) geographic variables to predict the location of archaeological sites. The paper uses established methods of cross-validation using the ROC and AUC as well as LR. The authors found that low plains with access to water were preferred locations for past human settlement based on their predictive model. Notably, all 129 sites used in the testing were from low plains with access to water and none were in the mountain region. Many others have used environmental variables to create predictive models for past human selection and this paper will be of interest to others and it adds to the breadth of knowledge about Neolithic landscapes.

 

There are three major concerns with this manuscript as is. First, none of the known sites are present in the areas of higher elevation based on Figure 1. Therefore, of course all the geographical features used to predict where sites will be located will highlight that sites are in lowland areas, because there are no sites in the uplands in the training data. This bias is the largest downfall of this paper, as the input data is biasing the predicted locations of neolithic sites. If no sites exist in the uplands, then the authors must clearly explain why this predictive model is needed and useful. Second, there are not adequate citations for the reader to see where the authors collected their data or to situate the manuscript within broader contexts. Third, how the geographic variables were calculated and details in the methods allowing others to replicate the process on their own data is insufficient. As is, it would be difficult for someone else to replicate this study given the lack the detail. I describe these three concerns along with other feedback in detail below. I also provide a few line-by-line comments for the authors.

 

Overall this study has a lot of potential but several items must be addressed. While the methods are sound, the lack of detail on how they were conducted, potential bias in input data, and lack of citations for source information guide the decision of my review. Finally, this manuscript was submitted to Sustainability, but the paper seems removed from the scope and aims of the journal. The authors should relate their findings to the themes and topics of this journal.

 

Feedback:

First, the bias in the positive data (i.e., where sites are) impacts the results of the predictive model. The sites are not distributed across the landscape but are visibly clustered in areas of low elevation (Figure 2A) with little slope (Figure 2B). The authors state that “archaeological sites are rarely probably to be located in the western mountainous areas” (Line 219-220) based on their predictive model. This makes sense because none of the training data are in that type of environment. The predictive model highlights what is already known, that “people dating back to the Neolithic period preferred plains and low hills with ample resources” (Line 222-223). This is in part because none of the positive sites are located in the mountains, so of course the predictive model will show that sites do not exist in the mountains. This is highlighted in Figures 4 and 5, which clearly show that none of the positive testing sites are in the western mountainous region, thus biasing the results. Likewise, the location of the 129 “negative” sites should be displayed on the map. What is their distribution compared to the known “positive” sites? It would be more useful to assess that variation in the 8 geographical variables among the known sites – what is the range of slopes among the known sites? Or the variation in elevation?  Finally, do the authors have any sites that they didn’t use in the training that they can then use to test the model?

 

Second, the sources used in this paper are not adequality cited. The authors state that the archaeological datasets are from the “Atlas of Chinese Cultural Relics - Hubei Branch (2 volumes), the Annals of Chinese Archaeology 1982-2019, master's and doctoral dissertations, and published research reports on discovered and excavated cultural relics and archaeological sites from heritage-related  departments.” (lines 105-108) but no specific citations are included. Which master’s thesis and doctoral dissertations were used to gather site data? What reports were used? The lack of detail in these citations mean that the reader cannot look up the information that is integral to this paper. Each source must be listed with the complete works cited. Table 1 is insufficient for the reader to find these sources. Similarly, where the was 30m DEM downloaded from? Is it a Landsat DEM? If so, which Landsat (5, 7, etc)? This concern is present throughout the paper, which has a total of 23 citations. Given the number of other scholars who have used environmental predictive models to assess human settlement selection, more citations are needed to situate this research within a broader framework.

 

Third, how the authors calculated the geographical variables is not clear. How was slope calculated (degree or slope)? Did the authors produce the slope model or was it acquired through one of their data sources? How as proximity to water calculated? Describe how Figure 2E was created in ArcGIS so readers can replicate the process with their own dataset. Each variable should be clearly defined including citing sources for how others have used similar methods in other spatiotemporal contexts. This will help to better contextualize the paper and make it relevant to a broader, global audience.

 

Early in the paper specifically mention that Xiangyang city is in China. This is not mentioned in the paper, although it is alluded to on Line 55 but Figure 1 makes it clear that the case study is from China. Line 49 states “domestic scholars” but doesn’t state where “domestic” is in reference to.

 

What defines a Neolithic site the Xiangyang region? An artifact scatter? The remains of buildings or ancient cities? Please define what a Neolithic site is and what is looks like archaeologically including the time period (range of years), and material/cultural remains found at the Neolithic sites.

 

The authors state that water is crucial for human settlement selection (lines 259-260) but there is only one citation for this. Later, the authors highlight 3 case studies that have used predictive models, which is good, but more studies could be cited to guide the reader (who may want to know more) and situate this study amidst the many others who have conducted predictive models for archaeology sites based on environmental variables. Many others have discussed how the environment plays into human settlement selection and the authors lack discussion or citations of similar works from broader spatiotemporal contexts; this should be expanded upon.  

 Along these same lines, how has climate change affected water levels and water availability from the Neolithic to now? Is it possible that more (or less!) water would have been available during the Neolithic? If so, how is that addressed in this paper? A quick sentence or two addressing this is needed given the time depth of the study and known climate variability.

 

Copy-editing for grammatical errors, flow, voice, and word choice is needed. For example, in the abstract lines 18-19 the passive voice is used in, “The model was trained and tested by the 10-fold cross-validation method”. This can be improved by using an active voice, such as, “We trained and tested the model using a 10-fold cross-validation method”. On lines 61-63 the authors state that image analysis "tried machine learning"; this concept needs to be rephrased since image processing cannot undertake an action.

 

Line-specific questions/comments:

 Line 21: first instance of AUC in both the abstract and manuscript should be spelled out

 

Discrepancy in the number of geographical factors used. Line 14 states 8 factors and Line 71 says 7 factors; these factors should be called variables instead.

 

Line 118: numbers less than 10 should be spelled out. (“5” should be “five”)

 

Line 123: “Geographical” is repeated. Delete one instance.

 

Line 194 is on the left side of the page.               

 

Lines 201-203: Great definitions of ROC and AUC. Citations for the definitions?

 

Line 208: “0778” should be “0.778” – add the period to the AUC value

 

Line 222: change “antient” to “ancient”

 

Line 318: Citation for the Diwan GA study?

 

Line 321: include dates for the Iron Age for the Bekaa example.

 

Lines 371-372: Why do the authors acknowledge themselves for their help? This is unusual.

 

Figure. Table, and References comments:

 Figure 1. Great map that includes a scale bar, north arrow, legend, etc.

 

Figure 2. The legends on each map are sufficient if the maps were standalone. However, at the scale of the maps in the paper, one must zoom in substantially to read the maps including the legends. For example, Figure 2F Micro-geomorphology is difficult to read even when zoomed in to 200%.

 

Figure 3: Nicely shows how closely the training data aligns with the whole dataset.

 

Figure 6. What are the slices of data missing on the map? This is not explained in the text or figure caption.

 

Table 3. What is the recall rate? Describe in detail so the reader knows what it is.

 

References should be cleaned up. Reference 1 is on several lines and the title is in all caps.

 

Reviewer 3 Report

The present study illustrates an interesting research topic, providing reasonable data and interpretations regarding the predictive modelling of Neolithic sites from Xiangyang. Thus, I believe that this is a good case study which could be published in SUSTAINABILITY.

However, there are some suggestions that could be made:

P3, R113-114: Within the text, the source that provided the data of waterbodies is presented as being Xiangyang Municipal Water Resources Bureau. Meanwhile, in Table 1, the source is Resource and Environmental Science and Data Center, Chinese Academy of Sciences.

P4, R123: The word Geographical is repeated in the beginning of the sentence

Figure 6:  The white areas visible on the maps should be better explained

P9, R267-272: I believe this paragraph about plane and profile curvature should be moved more to the beginning of the article (when talking about methodological aspects, rather than at results)

P10, R318, 325-326: Both Diwan and Kvamme should be cited and added to the reference list

Regarding all the maps, I believe you should consider, first of all, adding the units of measurement inside the legends (m.a.s.l., °, and others).  Also, at least the sub-figures from Figure 2 could be enlarged and at better resolution.

Overall, the research conducted by the authors is interesting and could represent an important contribution regarding the Neolithic period in Xiangyang, as well as case studies of predictive modelling but, only after performing thoroughly revisions of the text by a native English speaker. Also, I think many readers would appreciate the presence of the database (regarding the parameters obtained for the certain archaeological sites) in supplementary materials, as well as a paragraph that explains the geographical preferences manifested by the prehistoric communities studied.

Comments for author File: Comments.pdf

Reviewer 4 Report

This study provides a step forward in the exploration of part of the Hubei useful for both future archaeological surveys, and excavation as well as for planning sites management. I, therefore, acknowledge the usefulness of this study for these specific purposes on the one hand. On the other, I don’t think that the methodological approach proposed is a novel one. The selection of different variables and the automatic identification process are something that has been widely used worldwide. Therefore, from a more methodological point of view, I don’t think that this paper provides substantial news. Said that, I encourage the authors to reply to this point and demonstrate the novelty of their research. This point has been only partially tackled in 5.1.3 but a more detailed description is necessary.

Apart from this general comment that makes it necessary to conduct major corrections to the paper, below you can find more specific comments and critiques:

 

Language and grammar

I thorough revision of the layout is necessary since there are a lot of orthographic and layout errors that need to be corrected.

Also, the quality of the English is sometimes weak and it necessitates the revision of a native speaker.

The literature is very limited! Considering the popularity that this topic is getting I recommend a wider overview of the previous studies in connection with the scope of the paper.

 

Introduction

Line 34: I suggest adding at the end “and their potential protection, monitoring and management.”

Line 40: In general the bibliography is really poor. In this specific case, the authors state that “a number of explorations”. While I don’t question the (few!) examples proposed, I wonder why they chose them specifically (mostly in Spanish which is not a problem by itself but could prevent many readers to access the paper). My suggestion is to explain clearly the reason behind these choices (I’m quite sure that the same things proposed by these papers have been done also by others in English) and also to propose some English text alongside those.

On the other side, it makes more sense to quote examples from China (in English) which is the country of the case study, so as to allow parallels.

 

1. Overview

There is no literature quoted about the archaeology, history or geology of the region. From where have the author taken this information?

 

2.1 Data source

This text “Atlas of Chinese Cultural Relics - Hubei Branch (2 volumes), the Annals of Chinese Archaeology 1982-2019” should not be quoted like this, but according to the journal guidelines.

Line 109: It’s not clear to me the rationale behind the “careful selection” of certain sites. Please explain better your choice.  I presume these are only Neolithic sites but still the way this is explained is quite unclear.

 

4. Archaeological site prediction results

This is a provocative question but how do we know the position of the excavated sites is biasing data? The majority of them is located in flat or hilly area but not in mountains. Although this is quite expectable, I wonder if the lack of potential sites in the mountains could be influenced by the fact that no or few have been excavated in this kind of environment.  

 

5.2 Conclusions of the study

 

Line 358-359: The authors conclude with “It is worth noting that more research data should be added to the future research and human and social factors should be considered to further improve the model accuracy.” This is a too general statement. Can they provide more concrete future outlooks? How are they planning to improve model accuracy (1-2 lines are enough).

Back to TopTop