A Sensitivity Test on the Modifiable Areal Unit Problem in the Spatial Aggregation of Fossil Data

Ye, Shan

doi:10.3390/geosciences14090247

Open AccessArticle

A Sensitivity Test on the Modifiable Areal Unit Problem in the Spatial Aggregation of Fossil Data

by

Shan Ye

School of Information Engineering, China University of Geosciences (Beijing), Beijing 100083, China

Geosciences 2024, 14(9), 247; https://doi.org/10.3390/geosciences14090247

Submission received: 21 August 2024 / Revised: 19 September 2024 / Accepted: 21 September 2024 / Published: 23 September 2024

(This article belongs to the Section Sedimentology, Stratigraphy and Palaeontology)

Download

Browse Figures

Versions Notes

Abstract

In paleobiology and macroevolution research, the spatial aggregation of fossil data can be influenced by the modifiable areal unit problem (MAUP), wherein the selection of different grid-cell sizes for data aggregation can lead to variations in statistical results. This study presents a case analysis focused on the spatial extent of marine bivalves and brachiopods over time across three Areas of Interest (AOIs) to evaluate the potential impact of the MAUP in grid-based fossil data processing. By employing rectangular grid matrices with cell sizes of 50, 100, 200, and 400 km, this research assesses the MAUP-related sensitivity of two commonly used grid-based proxies for species’ spatial distribution. The results reveal that the proxy based on the number of occupied grid cells (OGCs) is particularly sensitive to changes in cell size, whereas the proxy based on minimum-spanning-tree distance (MST distance) demonstrates greater robustness across varying grid scales. This study underscores that when constructing proxies for species’ spatial distribution ranges using grid matrices, the OGC method is more susceptible to MAUP effects than the MST distance method, warranting increased caution in studies employing the OGC approach.

Keywords:

modifiable areal unit problem; fossil sampling bias; spatial analysis; grid-based data aggregation; minimum-spanning tree

1. Introduction

Fossil data are fundamental to understanding Earth’s history and the evolution of life, and they usually contain spatial attributes, including the present-day geographic coordinates of each fossil locality, as well as the paleogeographic coordinates corresponding to each fossil occurrence. Consequently, spatial analysis is frequently utilized as an effective approach in paleobiological and macroevolutionary research using fossil data [1,2,3]. Although spatial analysis can provide a wealth of information, such as the zonal diversity of lives and the spatial distribution range of a certain species in a given time interval, it can also introduce uncertainties and statistical biases, particularly when using grid-based methods to spatially aggregate fossil data.

Aggregating point data into polygon grids can introduce biases in statistical results due to the varying sizes of grid cells; this is a phenomenon known as the modifiable areal unit problem (MAUP). This problem, which has been identified as a major challenge in the field of geography, arises from the dimensional mismatch between point-based data and area-based units for spatial statistics [4,5,6]. Beyond geography, the MAUP is also a common concern in fields such as public health, urban planning, landscape ecology, economics, and demography and has been extensively studied in these areas [7,8,9,10,11,12]. In paleobiology, point-area dimensional discrepancy is also encountered, as we can introduce additional biases or uncertainties when fossil occurrences, typically represented as point data, are aggregated into spatial polygon grids. Therefore, the MAUP represents a potential source of bias in spatially aggregated fossil data.

In recent years, biases in fossil sampling and synthesized fossil datasets have been the subject of considerable scientific analysis and discussions [13,14,15,16,17,18,19]. Researchers have identified that statistical biases in fossil data can stem from various sources, including physical factors such as depositional environments, strata preservation, and climate conditions [20,21,22,23,24,25,26], as well as socioeconomic factors like the availability of research funding across different countries. [3,16,27] However, specific discussions on the implications of the MAUP in fossil data analysis remain limited. This manuscript addresses this gap by presenting a case study that investigates the MAUP within the context of paleobiology, where two commonly used grid-based proxies for the spatial breadth of species were evaluated regarding their sensitivity to grid-cell sizes.

2. Materials and Methods

Two common grid-based methods were employed as proxies to assess the spatial ranges of species in paleobiology and macroevolution studies. The first proxy is the number of occupied grid cells (OGCs) of a species. A greater number of occupied grid cells typically indicates a wider distribution range of the species within a given time interval. The second proxy is the minimum-spanning-tree distance (MST distance), which is calculated based on the centroid coordinates of occupied grid cells [28,29]. The MST distance, originally a concept from graph theory, measures the shortest total distance required to connect a set of points without forming any cycles [30,31]. In the context of grid-based spatial range measurement, a longer MST distance across the centroids of all occupied cells suggests a broader distribution of the species during a specific time interval [29,32]. Unlike the OGC method, MST distance accounts for the overall spatial configuration of occupied grid cells rather than merely the count of those cells [32,33].

This case study focuses on the spatial distribution of marine bivalves and brachiopod species over time. These marine organisms are commonly used in macroevolutionary research due to their high sampling rates and extensive spatial distributions [28,34]. Bivalves and brachiopods share similar body sizes, habitat preferences, and ecological niches, which has led to the assumption of competitive relationships between them. Previous studies have hypothesized that these competitive interactions exist within certain temporal and spatial ranges [35,36]. Both grid-based methods, as proxies of spatial coverage, have facilitated the exploration of the relationship between the mean spatial range of competing species and the total number of species within a given time interval. Grid-based spatial analyses have revealed that the number of potential competitors in a region does not fully explain the geographic distribution at the species level, suggesting the presence of underlying heterogeneity in the forces driving species’ spatial patterns [28].

However, previous studies have either used non-equal-area grids (e.g., grids defined by latitude and longitude) [32] or only used one equal-area grid matrix of fixed cell sizes [28,29] for spatial sampling, potentially making their results susceptible to the MAUP. In this study, grids with four different cell sizes were employed to assess MAUP-related sensitivities embedded in grid-based proxies for species distribution. This approach provides a quantitative illustration of how MAUP-related effects could influence the outcomes of diversity reconstructions and macroevolutionary studies.

Data on marine bivalves and brachiopods fossil occurrences were retrieved from the Paleobiology Database, which is one of the most extensive collections of fossil data accessible to the scientific community [1]. This database has facilitated numerous quantitative studies on a wide range of topics, including paleobiodiversity and macroevolution [37,38,39,40], stratigraphic and paleogeographic reconstruction [41,42,43], and the paleoclimate and paleoenvironment [44,45,46], among other areas of research. Following the selection criteria established in [28], occurrence data for marine bivalves and brachiopods spanning the Phanerozoic eon were extracted from the Paleobiology Database’s application programming interface [1]. Fossil occurrences with an accepted rank above the species level or with missing species information were excluded, resulting in a final dataset of 178,365 fossil occurrences (as of August 2024). In this dataset, the “accepted name” field was used to identify and count the number of species. The “paleolng” and “paleolat” fields provided the paleogeographic coordinates of fossils, which were reconstructed based on the GPlate model [1,47,48], and these paleogeographic coordinates were used for spatial aggregation to represent the extent of habitat locations of these organisms during their lifetimes [1].

In this study, Areas of Interest (AOIs) were defined based on regions with higher concentrations of paleogeographic coordinates for bivalve and brachiopod fossil records (Figure 1). AOI 1 contains 17,489 valid fossil occurrences representing 1721 species of marine bivalves and brachiopods. AOI 2 contains 20,579 fossil occurrences corresponding to 2704 species. AOI 3 contains 13,335 fossil occurrences from 1337 species. For each AOI, four different sets of rectangular spatial grid matrices with side lengths of 50 km, 100 km, 200 km, and 400 km were generated using the equal-area Eckert IV map projection. These grids were aligned to a common reference coordinate origin, ensuring that, for example, four grids with a 50 km side length perfectly fit within one grid of 100 km, and so forth, with their respective sides completely overlapping. The paleogeographic coordinates of the fossil data were transformed to the Eckert IV projection’s reference system, and the dataset was aggregated into 5-million-year (Ma) time bins. Within each time bin, the number of distinct species was counted, and the occurrences of each species were spatially intersected with the grid matrices to determine the number of grid cells they occupied. The OGC number for each species in each time bin was then counted, and the mean OGC values among species were calculated. Also, for each time bin, the mean MST distance across the centroid coordinates of occupied grid cells among all species was computed.

Pearson’s r and Spearman’s rho coefficients were calculated to assess the correlation between the mean OGC and species count across all 5-Ma time bins, as well as between the mean MST distance and species count across the same time bins for each grid matrix. Since the primary focus of this study is not on paleobiology or macroevolution per se but on MAUP-related issues within these fields, the exact values of Pearson’s r and Spearman’s rho are not emphasized. Instead, this study examines how these coefficients vary across the four grid matrices with different cell sizes. To this end, the ranges of Pearson’s r and Spearman’s rho values across the four grid matrices were calculated for each AOI.

The four grid matrices were generated using QGIS software (version 3.34), while all other spatiotemporal and statistical analyses were conducted using the R programming language (version 4.3.1).

3. Results

Figure 2 presents the time series for the species counts of bivalves and brachiopods, along with the ranges of two proxies of mean spatial ranges among species, namely the mean OGC and mean MST distance for each time bin across different grid matrices.

From the perspective of paleogeographic coordinates, fossil occurrences of AOIs 1 and 2 (Figure 1A,B) are clustered in mid-latitude areas in the northern hemisphere, while those of AOI 3 (Figure 1C) are clustered in mid-latitude areas in the southern hemisphere. We can assume that the marine environments of these three AOIs were similar during the same geologic time period and, therefore, the spatial distribution patterns of biodiversity embedded in their fossil data are comparable. From the modern perspective, their fossil localities are mostly in Europe and North America, where the levels of paleontological research and fossil sampling are generally good, so biases introduced by other socioeconomic factors are minimized [16,18].

In AOI 1, fossil occurrences of marine bivalves and brachiopods are predominantly found in the Mesozoic and Cenozoic, with species diversity peaking during the Eocene (Figure 2A). When using mean OGC as a proxy for the average spatial breadth of species, the data reveal that the mean distribution breadth reached multiple peaks during the Early Jurassic, Late Cretaceous, and Eocene (Figure 2B). A similar trend is observed when mean MST distance is used as a proxy for mean spatial coverage, although the relative differences across the various spatial gridding methods are less pronounced for mean MST than mean OGC (Figure 2C).

In AOI 2, fossil occurrences of bivalves and brachiopods are also concentrated in the Mesozoic and Cenozoic eras, with species diversity peaking during the Jurassic and a secondary peak occurring in the Late Cretaceous (Figure 2A). The mean spatial breadth of species in AOI 2 exhibits greater complexity than that in AOI 1, with both mean OGC and mean MST distance showing some periodicity over time. However, similar to AOI 1, variations in grid-cell sizes result in visually clear discrepancies in the time series of mean OGC, with minimal effects on mean MST distance (Figure 2B,C).

Within AOI 3, fossil occurrences of bivalves and brachiopods are primarily concentrated in the Paleozoic, with a few occurring in the Late Cretaceous and Cenozoic. The species diversity peaked during the Late Ordovician, and there was a secondary peak in the Late Devonian (Figure 2A). According to both mean OGC and mean MST distance, the mean species breadth was maximized in the Devonian. Similarly, variations in the cellular side lengths of gridding matrices result in notable discrepancies in the mean OGC, with little impact on the mean MST distance (Figure 2B,C).

Figure 3A–C display scatter plots illustrating the relationship between species count and mean OGC for each 5-Ma time bin across four different grid matrices, with linear regression lines added for each matrix. The scatter distributions and slopes of the linear regression lines are noticeably distinct and non-overlapping, underscoring that varying grid sizes can lead to clear discrepancies when calculating the mean OGC among species within a specific time interval.

In AOI 1, the grid with a cellular side length of 50 km results in the steepest linear regression slope between species count and mean OGC, followed by the grid with a 400 km side length. The slopes for grids with 100 km and 200 km side lengths are relatively small. The intercept for the 50 km grid’s linear regression is lower than that of the other grid sizes (Figure 3A). AOI 2 and AOI 3 exhibit similar patterns in their linear regression results. In both regions, the slope of the linear regression increases with the side length of the grid cells. Specifically, the grid with a 400 km cellular side length has the highest regression slope between species count and mean OGC, followed by the grid with a 200 km side length and the 100 km grid. The 50 km grid yields the lowest slope in the linear regression.

Figure 3D–F illustrate scatter plots depicting the relationship between species count and mean MST distances for each 5-Ma time bin across the four grid matrices. In contrast to the relationship between species count and mean OGC, the relationship between species count and mean MST distance exhibits minimal variation across different grid sizes. Within each AOI, the scatter distributions, as well as the slopes and intercepts of the linear regressions, remain largely consistent.

Figure 4 displays the ranges of Pearson’s r and Spearman’s rho coefficients among the four grid matrices for each AOI. Across all AOIs, using mean OGC as a proxy for the average spatial distribution of species results in a larger range of Pearson’s r and Spearman’s rho across different grid matrices compared to using mean MST distances as a proxy. However, variations are also observed between different AOIs. Specifically, AOI 2 shows smaller ranges of Pearson’s r than the other AOIs, regardless of the chosen proxy, although the range based on mean MST distance is much smaller. In contrast, AOI 1 generally exhibits the largest ranges for both Pearson’s r and Spearman’s rho, with both exceeding 0.1. Although the differences in the ranges of Pearson’s r and Spearman’s rho in AOI 3 are relatively small, the mean OGC still produces slightly larger ranges.

4. Discussion

4.1. Results of Sensitivity Tests

The time series and scatter plots of species count versus mean spatial coverage for each time bin clearly show that using mean OGC as a proxy for species’ spatial breadth leads to greater variability in results based on different grid cell sizes. In contrast, when mean MST distance is employed as a proxy, the impact of varying cell sizes on mean spatial coverage is comparatively minor. This pattern is further corroborated by the correlation coefficients; Pearson’s r and Spearman’s rho between species count and mean OGC exhibit greater variability (larger ranges) with changes in grid size, indicating a higher sensitivity of statistical results to grid-cell size when using mean OGC.

The degree of sensitivity varies across the different AOIs, as indicated by the ranges of correlation coefficients. AOI 1 exhibits the highest sensitivity when using the mean OGC method, with both Pearson’s r and Spearman’s rho exceeding 0.1, potentially influencing data interpretation. AOI 2 shows smaller ranges of Pearson’s r for both proxies, but Spearman’s rho is notably more sensitive in this AOI when using OGC as the proxy. In AOI 3, the differences between the ranges are minimal for both proxies, suggesting a similar sensitivity in the correlation coefficients, with mean OGC being slightly more sensitive. On the other hand, despite some variation across different AOIs, statistical results based on mean MST distance generally display lower sensitivity to grid-cell size. Thus, as a proxy for species spatial coverage, OGC is more susceptible to MAUP issues than MST distance, particularly in AOIs 1 and 2. This sensitivity may affect our understanding of deep-time biodiversity and the geographic distribution of species, at least within certain geographic extents.

The higher sensitivity of OGC to the side lengths of grid cells can be attributed to several potential factors. The number of occupied cells directly depends on how the grid cells align with the spatial pattern of the fossil occurrence data. If the grid cells are too large, a single cell may contain multiple fossil localities that are relatively far away from each other, leading to a lower count of occupied cells. Conversely, if the cells are too small, some clustering fossil points may occupy many cells, potentially inflating the count of occupied grid cells. From the perspective of fossil data, if paleogeographic coordinates of fossils are relatively clustered, smaller grid cells might capture more cells with fewer fossils, emphasizing the fossil distribution’s granularity and leading to a higher count of occupied cells. If paleogeographic coordinates of fossils are sparsely distributed, larger grid cells may still encapsulate multiple fossil points, resulting in fewer occupied cells and introducing bias.

The edge effect is another potentially important factor leading to an exacerbation of the MAUP under the OGC method. Counting occupied cells is a form of aggregation that can lose detailed spatial information. Larger cells can mask underlying point distributions, while smaller cells may capture more detail but also introduce noise from sparsely populated cells. The spatial configurations of occupied cells are not taken into consideration, either. The location of fossil occurrences near the edges of grid cells can significantly affect the count of occupied cells. If a cell is partially filled with fossil points (especially in larger cells), it may be counted as occupied, but this effect may diminish with smaller cells, leading to more precise counts.

The MST distance, on the other hand, focuses on the centroids of occupied grid cells instead of counting individual cells. This aggregation reduces the impact of variations in the cell size, since it emphasized the spatial configuration of occupied cells rather than simply the number of occupied cells, as the centroids provide a summary representation of the spatial distribution. The main advantage of MST distance over OGC in this regard is that the MST method focuses on the spatial structure of fossil localities rather than the size of occupied areas. The MST distance emphasizes the connectivity and distance between occupied cells. It captures the overall pattern of the fossil distribution in a way that is less sensitive to the number of fossil localities within each cell or to variations in grid size.

To be more specific, the MST calculates distances between centroids of occupied grid cells, which primarily reflects the spatial relationships among the occupied cells rather than the individual fossil localities. As a result, this method is less affected by how many fossils occupy each cell. Changes in cell size may lead to different configurations of occupied cells, but as long as the overall spatial arrangement of fossil localities remains consistent, the centroids of occupied cells provide a stable representation of the spatial structure. This means that the MST distances can remain relatively unchanged, even when cell sizes vary.

4.2. Broader Implication

The grid-based approach employed to evaluate the spatial distribution of species in this study represents a form of data integration commonly used in paleontology and fossil research. Our findings suggest that certain proxies may be susceptible to the MAUP. Moreover, numerous grid-based spatial data aggregation techniques exist within paleontological research, all of which could be influenced by the MAUP to varying extents. The spatial units used in similar cases are not restricted to rectangular grids; hexagonal and irregular grids, such as those defined by countries, provinces, or geological bedrock polygons, are also frequently applied for the spatial integration of fossil data, and the implications of the MAUP for these unit types are worthy of more attention.

Just as the MAUP is recognized in disciplines like geography, economics, and the social sciences, similar spatial aggregation processes that transform point data (including rocks, minerals, drilling cores, and geochemical specimens) into polygonal units may also carry MAUP-related implications, highlighting the need for further investigation within the broader geoscience context.

4.3. Limitations

This study has several limitations that need to be acknowledged. First, this research only utilized fossil data from two phyla—marine bivalves and brachiopods—from the Paleobiology Database. These groups have been previously employed in research exploring the relationship between species diversity and the spatial range of species, thereby aiding our understanding of competition and evolution throughout Earth’s history [28]. Whether other more mobile species (particularly marine vertebrates or terrestrial animals) present similar MAUP issues and whether MST distance demonstrates greater robustness against MAUP than OCG for other species are questions worth exploring in future research. Secondly, this study selected only three AOIs, which may not adequately reflect global trends. However, investigating MAUP issues related to the spatial aggregation of fossil data on a global scale presents a significant challenge. Establishing an equal-area rectangular grid matrix covering the entire world is a complex task due to map projection restrictions, and computing the MST distance for global fossil and grid datasets requires substantial computational resources. Moreover, the fossil sampling itself may inherently contain considerable uncertainties and sampling biases, and the fossil sampling intensity can also vary considerably across the space [3]. It is important to note that similar issues may arise in the temporal domain, where differences in data binning and aggregation methods can lead to variations in the subsequent statistical results. The GPlates model used for calculating paleogeographic coordinates may also introduce some inaccuracies. Such uncertainties and their dynamic interplay with the MAUP fall outside the scope of this study but are worth exploring in future works.

Nevertheless, the results of this study illustrate that MAUP-like phenomena do exist in fossil-related data, although the intensity might vary from place to place and among different fossil datasets. Such MAUP-related effects may affect the statistical results at certain geographic extents or specifically for some fossil-related research topics, thereby affecting our understanding of paleobiological diversity and evolutionary history. Therefore, in future studies, if grid-based methods are used in the aggregation of spatial data, similar sensitivity tests are recommended.

5. Conclusions

This study utilized fossil data from marine bivalves and brachiopods across three selected areas of interest to preliminarily investigate the potential effects of the MAUP in fossil-related, grid-based spatial data aggregation and to assess its implications for our understanding of paleobiodiversity and macroevolution. Two grid-based aggregation methods commonly used as proxies of species’ spatial ranges were examined, namely the OGC and the MST distance. The findings from this case study reveal heterogeneity in MAUP-related sensitivity among the AOIs. However, overall, the OGC method exhibited greater sensitivity to grid-cell size, leading to more pronounced MAUP effects and a potentially larger impact on paleobiodiversity studies. In contrast, the MST distance method demonstrated less sensitivity to grid cell size, yielding more robust results for spatially aggregated fossil data. Consequently, in future paleobiological and macroevolutionary research that requires grid-based aggregation to analyze species’ spatial distribution, MST distance may be a more reliable proxy for species spatial coverage than the OGC method.

Funding

This research was partially funded by China University of Geosciences, Beijing (grant number 590124048).

Data Availability Statement

Raw data used in this research can be freely accessed in the Paleobiology Database. Processed dataset used for this study and examples of R scripts can be accessed in the following GitHub repository: https://github.com/yeshancqcq/MAUP-Fossil (accessed on 21 September 2024).

Acknowledgments

The author would like to thank Xiang Ye for helpful discussions. This is Paleobiology Database publication number 501.

Conflicts of Interest

The author declares no conflicts of interest.

References

Peters, S.E.; McClennen, M. The Paleobiology Database application programming interface. Paleobiology 2016, 42, 1–7. [Google Scholar] [CrossRef]
Close, R.A.; Benson, R.B.J.; Alroy, J.; Carrano, M.T.; Cleary, T.J.; Dunne, E.M.; Mannion, P.D.; Uhen, M.D.; Butler, R.J. The apparent exponential radiation of Phanerozoic land vertebrates is an artefact of spatial sampling biases. Proc. R. Soc. B 2020, 287, 20200372. [Google Scholar] [CrossRef]
Ye, S.; Peters, S.E. Bedrock geological map predictions for Phanerozoic fossil occurrences. Paleobiology 2023, 49, 394–413. [Google Scholar] [CrossRef]
Fotheringham, A.S.; Wong, D.W. The modifiable areal unit problem in multivariate statistical analysis. Environ. Plan. A 1991, 23, 1025–1044. [Google Scholar] [CrossRef]
Larsen, J.L. The Modifiable Areal Unit Problem: A Problem or a Source of Spatial Information? Doctoral Dissertation, The Ohio State University, Columbus, OH, USA, 2020. [Google Scholar]
Ye, X.; Rogerson, P. The impacts of the modifiable areal unit problem (MAUP) on omission error. Geogr. Anal. 2022, 54, 32–57. [Google Scholar] [CrossRef]
Jelinski, D.E.; Wu, J. The modifiable areal unit problem and implications for landscape ecology. Landsc. Ecol. 1996, 11, 129–140. [Google Scholar] [CrossRef]
Zhang, M.; Kukadia, N. Metrics of urban form and the modifiable areal unit problem. Transp. Res. Rec. 2005, 1902, 71–79. [Google Scholar] [CrossRef]
Flowerdew, R. How serious is the Modifiable Areal Unit Problem for analysis of English census data? Popul. Trends 2011, 145, 106–118. [Google Scholar] [CrossRef]
Pietrzak, M.B. Redefining the modifiable areal unit problem within spatial econometrics, the case of the scale problem. Equilib. Q. J. Econ. Econ. Policy 2014, 9, 111–132. [Google Scholar] [CrossRef][Green Version]
Wang, Y.; Di, Q. Modifiable areal unit problem and environmental factors of COVID-19 outbreak. Sci. Total Environ. 2020, 740, 139984. [Google Scholar] [CrossRef]
Manley, D. Scale, aggregation, and the modifiable areal unit problem. In Handbook of Regional Science; Fischer, M.M., Nijkamp, N., Eds.; Springer: Berlin/Heidelberg, Germany, 2014; pp. 1711–1725. [Google Scholar]
Starrfelt, J.; Liow, L.H. How many dinosaur species were there? Fossil bias and true richness estimated using a Poisson sampling model. Philos. Trans. R. Soc. B Biol. Sci. 2016, 371, 20150219. [Google Scholar] [CrossRef]
Cleary, T.J.; Benson, R.B.; Evans, S.E.; Barrett, M. Lepidosaurian diversity in the Mesozoic–Palaeogene: The potential roles of sampling biases and environmental drivers. R. Soc. Open Sci. 2018, 5, 171830. [Google Scholar] [CrossRef]
Balseiro, D.; Powell, M.G. Relative oversampling of carbonate rocks in the North American marine fossil record. Paleobiology 2023, 49, 733–746. [Google Scholar] [CrossRef]
Raja, N.B.; Dunne, E.; Matiwane, A.; Khan, T.M.; Nätscher, P.; Ghilardi, A.M.; Chattopadhyay, D. Colonial history and global economics distort our understanding of deep-time biodiversity. Nat. Ecol. Evol. 2021, 6, 145–154. [Google Scholar] [CrossRef]
Benson, R.B.; Butler, R.; Close, R.A.; Saupe, E.; Rabosky, D.L. Biodiversity across space and time in the fossil record. Curr. Biol. 2021, 31, R1225–R1236. [Google Scholar] [CrossRef]
Ye, S. A Quantitative Investigation of Large Geoscientific Datasets: How Records of Geochronology and Macroevolution Are Distorted by Paleoclimate, Paleoenvironment, and Sediment Preservation. Doctoral Dissertation, The University of Wisconsin-Madison, Madison, WI, USA, 2022. [Google Scholar]
Dunne, E.M.; Thompson, S.E.; Butler, R.J.; Rosindell, J.; Close, R.A. Mechanistic neutral models show that sampling biases drive the apparent explosion of early tetrapod diversity. Nat. Ecol. Evol. 2023, 7, 1480–1489. [Google Scholar] [CrossRef]
Smith, A.B.; Gale, A.S.; Monks, N. Sea-level change and rock-record bias in the Cretaceous, a problem for extinction and biodiversity studies. Paleobiology 2001, 27, 241–253. [Google Scholar] [CrossRef]
Peters, S.E.; Foote, M. Determinants of extinction in the fossil record. Nature 2002, 416, 420–424. [Google Scholar] [CrossRef]
Heim, N.A.; Peters, S.E. Covariation in macrostratigraphic and macroevolutionary patterns in the marine record of North America. Geol. Soc. Am. Bull. 2010, 123, 620–630. [Google Scholar] [CrossRef]
Dunhill, A.M.; Hannisdal, B.; Benton, M.J. Disentangling rock record bias and common-cause from redundancy in the British fossil record. Nat. Commun. 2014, 5, 4818. [Google Scholar] [CrossRef]
Dean, C.D.; Mannion, D.; Butler, R.J. Preservational bias controls the fossil record of pterosaurs. Palaeontology 2016, 59, 225–247. [Google Scholar] [CrossRef]
Capel, E.; Monnet, C.; Cleal, C.J.; Xue, J.; Servais, T.; Cascales-Miñana, B. The effect of geological biases on our perception of early land plant radiation. Palaeontology 2023, 66, e12644. [Google Scholar] [CrossRef]
Ye, S. Investigating the role of contemporary climate on fossil collecting bias. Paleontol. Res. 2024, 28, 407–419. [Google Scholar] [CrossRef]
Kiessling, W. Habitat effects and sampling bias on Phanerozoic reef distribution. Facies 2005, 51, 24–32. [Google Scholar] [CrossRef]
Antell, G.T.; Kiessling, W.; Aberhan, M.; Saupe, E.E. Marine biodiversity and geographic distributions are independent on large scales. Curr. Biol. 2020, 30, 115–121. [Google Scholar] [CrossRef]
Flannery-Sutherland, J.T.; Silvestro, D.; Benton, M.J. Global diversity dynamics in the fossil record are regionally heterogeneous. Nat. Commun. 2022, 13, 2751. [Google Scholar] [CrossRef]
Gower, J.C.; Ross, G.J. Minimum spanning trees and single linkage cluster analysis. J. R. Stat. Soc. Ser. C (Appl. Stat.) 1969, 18, 54–64. [Google Scholar] [CrossRef]
Dankelmann, P.; Entringer, R. Average distance, minimum degree, and spanning trees. J. Graph Theory 2000, 33, 1–13. [Google Scholar] [CrossRef]
Close, R.A.; Benson, R.B.; Upchurch, P.; Butler, R.J. Controlling for the species-area effect supports constrained long-term Mesozoic terrestrial vertebrate diversification. Nat. Commun. 2017, 8, 15381. [Google Scholar] [CrossRef]
Gabriely, Y.; Rimon, E. Spanning-tree based coverage of continuous areas by a mobile robot. Ann. Math. Artif. Intell. 2001, 31, 77–98. [Google Scholar] [CrossRef]
Gould, S.J.; Calloway, C.B. Clams and brachiopods—Ships that pass in the night. Paleobiology 1980, 6, 383–396. [Google Scholar] [CrossRef]
Payne, J.L.; Heim, N.A.; Knope, M.L.; McClain, C.R. Metabolic dominance of bivalves predates brachiopod diversity decline by more than 150 million years. Proc. R. Soc. B Biol. Sci. 2014, 281, 20133122. [Google Scholar] [CrossRef]
Carlson, S.J. The evolution of Brachiopoda. Annu. Rev. Earth Planet. Sci. 2016, 44, 409–438. [Google Scholar] [CrossRef]
Alroy, J. The shifting balance of diversity among major marine animal groups. Science 2010, 329, 1191–1194. [Google Scholar] [CrossRef]
Chiarenza, A.A.; Mannion, P.D.; Lunt, D.J.; Farnsworth, A.; Jones, L.A.; Kelland, S.-J.; Allison, P.A. Ecological niche modelling does not support climatically-driven dinosaur diversity decline before the Cretaceous/Paleogene mass extinction. Nat. Commun. 2019, 10, 1091. [Google Scholar] [CrossRef]
Mannion, D.; Chiarenza, A.A.; Godoy, L.; Cheah, Y.N. Spatiotemporal sampling patterns in the 230 million year fossil record of terrestrial crocodylomorphs and their impact on diversity. Palaeontology 2019, 62, 615–637. [Google Scholar] [CrossRef]
Cantalapiedra, J.L.; Sanisidro, Ó.; Zhang, H.; Alberdi, M.T.; Prado, J.L.; Blanco, F.; Saarinen, J. The rise and fall of proboscidean ecological diversity. Nat. Ecol. Evol. 2021, 5, 1266–1272. [Google Scholar] [CrossRef]
Peredo, C.M.; Uhen, M.D. Exploration of marine mammal paleogeography in the Northern Hemisphere over the Cenozoic using beta diversity. Palaeogeogr. Palaeoclimatol. Palaeoecol. 2016, 449, 227–235. [Google Scholar] [CrossRef]
Cao, W.C.; Zahirovic, S.; Flament, N.; Williams, S.; Golonka, J.; Müller, R.D. Improving global paleogeography since the late Paleozoic using paleobiology. Biogeosciences 2017, 14, 5425–5439. [Google Scholar] [CrossRef]
Wang, X.; Ren, Q.; Hou, M.; Dong, J.; Chen, A.; Ma, C.; Zhong, H.; Zheng, D. Progress in the applications of the Paleobiology Database in paleogeographic reconstruction. Sediment. Geol. Tethyan Geol. 2024, 44, 34–44. [Google Scholar]
Sessa, J.A.; Callapez, M.; Dinis, A.; Hendy, A.J.W. Paleoenvironmental and paleobiogeographical implications of a Middle Pleistocene mollusc assemblage from the marine terraces of Baía Das Pipas, southwest Angola. J. Paleontol. 2013, 87, 1016–2040. [Google Scholar] [CrossRef]
Reddin, C. Climate change and the latitudinal selectivity of ancient marine extinctions. Paleobiology 2019, 45, 70–84. [Google Scholar] [CrossRef]
Chiarenza, A.A.; Mannion, D.; Farnsworth, A.; Carrano, M.T.; Varela, S. Climatic constraints on the biogeographic history of Mesozoic dinosaurs. Curr. Biol. 2022, 32, 570–585. [Google Scholar] [CrossRef]
Ogg, J.G.; Scotese, C.R.; Hou, M.; Chen, A.; Ogg, G.M.; Zhong, H. Global paleogeography through the proterozoic and phanerozoic: Goals and challenges. Acta Geol. Sin.-Engl. Ed. 2019, 93, 59–60. [Google Scholar] [CrossRef]
Scotese, C.R.; van der Pluijm, B.A. Deconstructing tectonics: Ten animated explorations. Earth Space Sci. 2020, 7, e2019EA000989. [Google Scholar] [CrossRef]

Figure 1. Maps of the AOI 1 (A), AOI 2 (B) and AOI 3 (C) with their contained paleogeographic coordinates of fossil occurrences (red dots). The current localities of these fossil occurrences are also plotted (black dots).

Figure 2. (A) Time series of marine bivalve and brachiopod species count across the Phanerozoic binned to 5-Ma time steps. (B) Maximum and minimum mean OGC across four different grids (solid lines), with the range shown by the shaded areas. (C) Maximum and minimum mean MST distances across four different grids (solid lines), with the range shown by the shaded areas. The three AOIs are color coded, where AOI 1 is shown in red, AOI 2 in blue, and AOI 3 in tan.

Figure 3. Scatter plots between mean OGC and species count for AOI 1 (A), AOI 2 (B), and AOI 3 (C), as well as between mean MST distances and species count for AOI 1 (D), AOI 2 (E), and AOI 3 (F) across grid matrices of different cell sizes, with linear regression lines attached to the plot. Note that regression lines generally overlap under the mean MST distance method in (D–F).

Figure 4. The ranges of Pearson’s r (A) and Spearman’s rho (B) between species count and mean OGC, as well as between species count and mean MST distances across different grid matrices.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ye, S. A Sensitivity Test on the Modifiable Areal Unit Problem in the Spatial Aggregation of Fossil Data. Geosciences 2024, 14, 247. https://doi.org/10.3390/geosciences14090247

AMA Style

Ye S. A Sensitivity Test on the Modifiable Areal Unit Problem in the Spatial Aggregation of Fossil Data. Geosciences. 2024; 14(9):247. https://doi.org/10.3390/geosciences14090247

Chicago/Turabian Style

Ye, Shan. 2024. "A Sensitivity Test on the Modifiable Areal Unit Problem in the Spatial Aggregation of Fossil Data" Geosciences 14, no. 9: 247. https://doi.org/10.3390/geosciences14090247

APA Style

Ye, S. (2024). A Sensitivity Test on the Modifiable Areal Unit Problem in the Spatial Aggregation of Fossil Data. Geosciences, 14(9), 247. https://doi.org/10.3390/geosciences14090247

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Sensitivity Test on the Modifiable Areal Unit Problem in the Spatial Aggregation of Fossil Data

Abstract

1. Introduction

2. Materials and Methods

3. Results

4. Discussion

4.1. Results of Sensitivity Tests

4.2. Broader Implication

4.3. Limitations

5. Conclusions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI