*Article* **Forensic Tools for Species Identification of Skeletal Remains: Metrics, Statistics, and OsteoID**

**Heather M. Garvin 1,\*, Rachel Dunn <sup>1</sup> , Sabrina B. Sholts <sup>2</sup> , M. Schuyler Litten <sup>2</sup> , Merna Mohamed <sup>3</sup> , Nathan Kuttickat <sup>3</sup> and Noah Skantz <sup>3</sup>**


merna.mohamed@dmu.edu (M.M.); nathan.kuttickat@dmu.edu (N.K.); noah.a.skantz@dmu.edu (N.S.) **\*** Correspondence: heather.garvin-elling@dmu.edu

**Simple Summary:** Forensic anthropologists are commonly asked to determine whether bones are of human origin and, if not, to which species they belong. Current practice usually relies on visual assessments rather than quantitative analyses. This study aimed to test the utility of basic bone metrics in discriminating human from nonhuman elements and assigning faunal species. A database of more than 50,000 skeletal measurements was compiled from humans and 27 nonhuman species. Equations and classification trees were developed that can differentiate human from nonhuman species with upwards of 90% accuracy, even when the bone type is not first identified. Classification trees return accuracy rates greater than 98% for the human sample. These quantitative models provide statistical support to visual assessments and can be used for preliminary assessment of a bone's forensic significance at a scene. The statistical models, however, could not classify species at acceptable rates. For species identification, a freely available web tool (OsteoID) was created from the study data, where users can filter photographs of potential bones/species using a few basic measurements and access 3D scans and additional resources to facilitate identification. OsteoID provides an important resource for forensic anthropologists lacking access to large comparative skeletal collections, as well as other disciplines where comparative osteological training is necessary.

**Abstract:** Although nonhuman remains constitute a significant portion of forensic anthropological casework, the potential use of bone metrics to assess the human origin and to classify species of skeletal remains has not been thoroughly investigated. This study aimed to assess the utility of quantitative methods in distinguishing human from nonhuman remains and present additional resources for species identification. Over 50,000 measurements were compiled from humans and 27 nonhuman (mostly North American) species. Decision trees developed from the long bone data can differentiate human from nonhuman remains with over 90% accuracy (>98% accuracy for the human sample), even if all long bones are pooled. Stepwise discriminant function results were slightly lower (>87.4% overall accuracy). The quantitative models can be used to support visual identifications or preliminarily assess forensic significance at scenes. For species classification, bonespecific discriminant functions returned accuracies between 77.7% and 89.1%, but classification results varied highly across species. From the study data, we developed a web tool, OsteoID, for users who can input measurements and be shown photographs of potential bones/species to aid in visual identification. OsteoID also includes supplementary images (e.g., 3D scans), creating an additional resource for forensic anthropologists and others involved in skeletal species identification and comparative osteology.

**Keywords:** forensic anthropology; medicolegal death investigation; forensic significance; comparative osteology; human osteology; skeletal morphology; nonhuman

**Citation:** Garvin, H.M.; Dunn, R.; Sholts, S.B.; Litten, M.S.; Mohamed, M.; Kuttickat, N.; Skantz, N. Forensic Tools for Species Identification of Skeletal Remains: Metrics, Statistics, and OsteoID. *Biology* **2022**, *11*, 25. https://doi.org/10.3390/ biology11010025

Academic Editors: Ann H. Ross, Eugénia Cunha and Andrés Moya

Received: 31 October 2021 Accepted: 22 December 2021 Published: 25 December 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

#### **1. Introduction**

Forensic anthropologists are commonly approached by law enforcement, coroners, and medical examiners with an unknown skeletal element and faced with a simple question: is this human [1,2] Well-trained forensic anthropologists know the human skeletal system in meticulous detail, and unless the skeletal element has been highly modified (e.g., extreme fragmentation, burning, etc.), they can usually differentiate human from nonhuman remains without hesitation [3]. Forensic anthropologists visually assess the bone, determining the element type (e.g., humerus, femur, tibia, etc.) and whether it is consistent with human anatomy based on its size (given its developmental state), shape, and bony features [3]. This macroscopic assessment is usually concluded without metric analyses.

If the bone is human, it is of forensic significance and will be subjected to a comprehensive osteological analysis. If the bone is nonhuman, a forensic anthropologist is faced with an inevitable follow-up question: what is it? This question is more than mere curiosity because it provides verifiable evidence to support the forensic anthropologist's nonhuman designation [3]. An incorrect faunal species identification can affect the forensic anthropologist's credibility, even if it is not of forensic importance. Similarly, responding to the inquiry by stating that it is not important or that you do not know does not instill confidence or foster positive relationships with agencies. In some cases, the animal species may provide investigators additional evidence or context regarding the circumstances of death. For example, if the remains of a cat are found intermixed with human remains, it may suggest that a suspect disposed of a house pet along with the decedent in an attempt to conceal the human remains.

Faunal species identification, however, can be challenging for practitioners given the number of bones in a skeleton, variety of potential species, and similar morphology amongst related species [4]. While forensic anthropologists are required to be experts on the human skeleton, zooarchaeological training, while ideal, is not a requirement, and expertise in comparative osteology can vary greatly amongst practitioners. When determining the nonhuman species of skeletal remains, practitioners are fortunate if they have access to comparative osteological collections to assist with identifications. Such collections take time and resources to build or require proximity and unrestricted accessibility to an alreadyestablished collection. Various comparative osteology texts are available [5–13], each with their own advantages and limitations; they vary in cost, comprehensiveness, species included, photographic quality, and target audience. Texts are also most useful if the user knows the element type in advance and/or already suspects a certain species. Reliable and easily accessible online resources are limited, and internet searches for images of specific faunal elements can return mixed results.

The primary goal of this project was to develop additional, freely-available resources to support forensic anthropologists and medicolegal personnel in skeletal species identification based on simple measurements. Saulsman et al. [14] report discriminant functions derived from eight traditional long bone metrics that can differentiate human from five Australian nonhuman species with accuracy rates at or above 95%. Their sample sizes were limited to 50 human and 50 nonhuman individuals (ten per species). Given their promising results, this study aimed to test the utility of similar bone metrics in differentiating much larger samples of human and nonhuman specimens and classifying species, with a focus on species commonly encountered in North America. Although a handful of measurements cannot capture specific distinguishing bony features, traditional morphometric analyses can capture overall bone size and shape (i.e., form), which are variables considered subjectively during visual assessments of species.

In addition to the morphometric analyses, this study also aimed to develop a freely available searchable online database that uses basic metrics and visual aids (i.e., photographs and 3D scans) to help forensic anthropologists and medicolegal personnel (amongst others) determine species from skeletal elements. These resources would benefit practitioners without access to extensive comparative collections and would be accessible in the field via the use of a smart phone or other device. Beyond the scope of forensic anthropology,

this skeletal species identification tool may be useful to students, archaeologists, wildlife forensic specialists, biologists, veterinarians, and others, including the general public who may wish to learn more about bones they encounter through various activities.

#### **2. Materials and Methods**

The study sample included skeletal data from humans and 27 faunal species frequently found in North America (20 mammals, 5 birds, 2 turtles—see Table 1), which included species that approximate human sizes (e.g., deer, horse, elk, moose, cow, pig, domestic dog, and black and brown bears). The species included are also commonly presented in comparative osteology texts used by forensic anthropologists [5–9] and encountered in forensic anthropological analyses [1]. To facilitate database searching, analogous measurements needed to be obtainable from each specimen included, regardless of species or element type. Thus, long bones were chosen as the main focus for this study (humerus, radius, ulna, radio-ulna, femur, tibia, fibula, and fused metapodials). For birds, the tibiotarsus was included with the tibia data, and the carpometacarpus and tarsometatarsus were included with the fused metapodials. The scapula, sacrum and os coxae were also included given the ability to take maximum lengths and breadths and their diagnostic morphologies. The original measurement list consisted of maximum lengths, proximal and distal maximum breadths (medio-lateral) and depths (antero-posterior), midshaft minimum and maximum diameters, and a few unique measurements for certain elements (e.g., femoral head diameter, acetabular diameter). Von den Driesch [15] was used as a guide when establishing the measurements.

These measurement data were collected from skeletal remains curated at the following institutions: Smithsonian National Museum of Natural History, Washington, DC; American Museum of Natural History, New York City, NY; Mercyhurst University, Erie, PA; Washburn University, Topeka, KS; University of California, Davis, CA; and Des Moines University, Des Moines, IA. Additional data were included from published papers and available datasets [16–34]. In some cases, published data of specimens outside of North America were included in the study to increase sample sizes if the species was the same as that commonly encountered in North America (e.g., domestic dogs and cats). Inclusion in the study required specimens to be of skeletal maturity; specimens in advanced stages of epiphyseal fusion were included to increase faunal sample sizes where necessary. This original dataset consisted of 59,442 measurements from 18,867 bones from 5207 individuals/animals). Species averages, standard deviations, and minimum/maximum ranges were calculated for each measurement. Photographs of exemplar specimens were taken from multiple standard views (e.g., six views for long bones) for incorporation into the web tool.

A subset of the data (47,688 measurements collected from 16,315 long bone elements) was subjected to linear discriminant function (DFA) and decision tree analyses to evaluate potential methods of human versus nonhuman and species classifications (Table 1). This subset included maximum length (MaxL), maximum mediolateral width of the proximal epiphysis (MaxPW), maximum mediolateral width of the distal epiphysis (MaxDW), maximum anteroposterior depth of the distal epiphysis (MaxDD), maximum diameter of the midshaft (MaxMidD), and minimum diameter of the midshaft (MinMidD) collected from humeri, radii, ulnae, femora, and tibiae. Element-specific measurements (e.g., femoral head diameter) were excluded to permit pooled analyses across element types. Maximum proximal depth was excluded due to measurement difficulty in certain elements (e.g., tibia depending on tuberosity location, ulna, and radio-ulna). Step-wise DFA using Wilk's lambda and a leave-one-out cross-validation were performed on the human versus pooled nonhuman samples of all long bones (replicating a situation where the element type is unknown), and then separately for each bone. DFA was used to assess human versus nonhuman classification for commonly collected univariate variables (MaxL, MaxPW, and MaxDW) and variables grouped by bone region (e.g., distal measurements and midshaft measurements) for application in cases when the unknown element is incomplete/fragmented or taphonomic modifications preclude some measurements. Finally,

stepwise discriminant functions were also run to assess potential ability to classify the 28 species using both pooled-bone and bone-specific samples. Variables input into the stepwise analyses were chosen to maximize sample sizes and discriminatory power. Box's M was used to assess homogeneity in variance–covariance matrices, and Kolmogrov–Smirnov tests were performed to evaluate data normality.



<sup>1</sup> For the human and nonhuman comparisons, individual measurements were taken from fused radio-ulna elements and included as radius or ulna. For the development of the web tool, both the individual radius and ulna measurements and combined maximum lengths/widths for the fused radio-ulna were included for search purposes. <sup>2</sup> A few specimens were labeled as "Sheep/Goat" in the collection and thus entered this way for human versus nonhuman analyses but were excluded from species analyses.

Decision trees were developed from the same data set and evaluated for classifying human versus the pooled nonhuman samples and classifying species using both the pooledbone sample and bone-specific subsamples. The decision trees were created using a CRT (Classification and Regression Trees) growth model with a Gini impurity measure splitting criterion and a maximum tree depth of five levels. CRT uses stepwise variable selection to create a decision tree where each node is split using the variable that best maximizes the purity of the resulting nodes (i.e., homogeneity of the dependent variable) [35,36]. CRT also uses surrogate variables (those that result in a similar outcome pattern) to replace missing data, thereby maximizing sample sizes. The minimum number of cases for nodes was set at 100 for parent nodes and 50 for child nodes. Equal prior probabilities were used across groups. Tree pruning was implemented, set at one standard error in order to avoid overfitting [35,36]. A split-sample validation was applied, with the model generated from a training sample (70% of the data), which was then validated on the test sample (remaining 30% of the data). For the trees classifying human from nonhuman remains, human was set

as a target variable and a misclassification cost of ten was assigned to misclassifications of human bone as nonhuman. This reflects the more severe forensic implications in erroneously assigning a human bone as nonhuman as compared to misclassifying a nonhuman bone as human.

The linear discriminant function analyses represent more traditional classification approaches but have statistical assumptions such as multivariate normality and homogeneity of variance–covariance matrices [37–39]. Decision trees do not rely on these statistical assumptions [40–42]. All statistical analyses were performed in SPSS v.28 (IBM Corporation, Armonk, NY, USA). We hypothesized that the multivariate DFA and decision trees would be able to adequately differentiate human from nonhuman remains when single elements were assessed, given that these morphometric parameters are used during visual assessments of remains. The pooled-bone sample is expected to provide less accurate results, given the compounded effects of variation within and between species and element types. The results of the DFA and decision trees were used to make informed decisions about the development of the skeletal species identification web tool, with the possibility of integrating the methods into the tool depending on their performance.

#### **3. Results**

#### *3.1. Descriptive Statistics*

Sample sizes, minimum and maximum values, averages, standard deviations, and the ranges between two negative and two positive standard deviations (~95% confidence interval) were calculated per measurement and species (38 measurements collected across 28 species). Given the forensic aim to distinguish human from nonhuman remains, as well as the extensive dataset, Table 2 presents only the human summary statistics. This table may act as a general guide to assess whether a bone falls within the human size ranges; note, however, that there is always a small possibility of a human bone falling outside these values, given that samples may not represent the complete global variation of past and present populations. Descriptive statistics for nonhuman measurements by species are provided in the Supplementary Materials (Tables S1–S11).

#### *3.2. Morphometric Human Versus Nonhuman Classification*

When the human long bone measurements are compared to those of the pooled nonhuman long bones, Box's M indicates significant differences in the variance–covariance matrices (*p* < 0.001 for all analyses). This is true for both the pooled-bone and bone-specific samples. Kolmogrov–Smirnov results indicate that the nonhuman variables are not normally distributed, while the human data generally do not differ significantly from normality (*p* > 0.05). These results are unsurprising given the unequal sample sizes and range of nonhuman species being pooled (Table 1). DFA has been suggested to be robust against statistical violations [42]. For this reason and the exploratory nature of the analyses, the DFAs were performed despite the violation of statistical assumptions to provide comparison to the decision tree results and informed decisions about the web tool development.

The results of the human versus nonhuman DFA classification are summarized in Table 3, including overall cross-validated accuracy, group-specific cross-validated correct classifications, and sample sizes for each model. Note that DFA requires that all measurements are present for each element in the analysis, resulting in significant decreases in sample sizes for some models due to missing data. In each analysis, the cross-validated results were the same or similar to the original classification results. There are some classification biases, but in most cases, the human correct classification is higher than the nonhuman. Of the univariate analyses, maximum lengths performed the best with overall classification rates above 90% for all elements except for the ulna and a 79.5% classification rate for the pooled-bone analysis. The human classification rates using only maximum length were over 99% for all bones except the ulna (96.8%). The DFAs assessing regional measurements (two midshaft variables or two distal variables) provided results similar to or lower than the univariate maximum length results, with a

few exceptions. The ulna midshaft had a 90.0% correct classification, outperforming the length results, and the humerus midshaft accuracy was much lower than the length at 67.1% (vs. 94.1% for maximum length).



<sup>1</sup> Measurement abbreviations: MaxL = maximum length, MaxPW = maximum proximal width (medio-lateral), MaxDD = maximum distal depth (antero-posterior), MaxDW = maximum distal width (medio-lateral), MidMaxD = maximum diameter at midshaft, MidMinD = minimum diameter at midshaft, DiamH = femoral head diameter, DiamA = acetabulum diameter.

As expected, the pooled-bone DFAs did not perform as well as the bone-specific analyses for morphometric human versus nonhuman classification. The pooled-bone univariate analysis of maximum distal width performed the best (87.9%), which may be because ulnae were excluded from this analysis (distal ulna measurements were not collected) thereby removing one confounding element. Maximum length correctly classified 79.5% of the sample composed of 11,129 human bones and 5254 nonhuman bones.


**Table 3.** Linear DFA accuracy results and sample sizes for human (Hum) and nonhuman (Non) classifications summarized by element and variables. Overall accuracy is bolded. Var(s) = variable(s), NHum = human sample size, NNon = nonhuman sample size.

<sup>1</sup> All variables were included in the stepwise DFA and those retained in the function are listed in each column with the results.

The multivariate stepwise DFAs returned correct human versus nonhuman classification rates above 90% for the humerus, femur, and radius and just below 90% for the tibia and ulna (Table 3). Maximum length was utilized in all the stepwise functions and had the highest weight. For the humerus (*n* = 2753) and femur (*n* = 3458), a function including maximum length and maximum distal width returned accuracy rates of 96.7% and 98.1%, respectively. Other functions for the humerus and femur returned higher classification rates (99.5% for the humerus and 99.7% for the femur), but given the variables included in these functions, sample sizes decreased to around 1100. Equations associated with the multivariate discriminate functions are provided in the Supplementary Materials (Table S12).

The decision tree results outperformed the DFA results for human versus nonhuman classification (Table 4) and were derived from larger samples in both the training and test sets. With all bones pooled, decision trees that evaluated all measurements correctly classified 90% or more of the training and test samples, except for the ulna test sample (89.3%). The region-specific pooled-bone analyses had lower accuracy rates (ranging from 76 to 89% correct) but still outperformed the DFA. With the exception of the ulna test sample, all training and test samples had correct human classification rates of 98% or higher.

The ulna test sample correctly classified 94.5% of the human sample. Using four basic measurements, the decision tree presented in Figure 1 results in an overall accuracy of 91% and human classification accuracy of 99.6%; this is for the pooled-bone sample (i.e., without first identifying which bone is present). Although the nonhuman classification rate is lower (75%), this bias is expected given that we assigned higher misclassification costs to the human sample. The terminal nodes of the decision tree (Figure 1) indicate the number/percentage of human and nonhuman elements that fell within that node as well as associated sample sizes. Note that the "total" row depicts the percentage of the original input sample. The terminal nodes vary in their accuracy rates (75.2 to 99.8%), but only one of five terminal nodes had accuracy rates below 90%. This node (node 7) consists of ~17% of the total sample and represents those elements in which the multivariate sizes overlap between human and nonhuman species. For example, a deer metatarsal may approximate a human radius based on the measurements. Decision trees associated with the results in Table 4 are presented in the Supplementary Materials (Figures S1–S9).

**Table 4.** Decision tree results and sample sizes for human (Hum) and nonhuman (Non) classifications summarized by element and variables. Acc = accuracy, N = sample size. See Supplementary Materials (Figures S1–S9) for the decision trees.


#### *3.3. Morphometric Skeletal Species Identification*

Correct species classification rates from the stepwise DFAs are summarized in Table 5. The pooled-bone analysis had an overall 40.4% accuracy rate, which, although better than the a priori classification rate (3.6%), can lead to numerus classification issues. For this model, 20 species had correct classification rates below 50%, with only two species (eastern cotton-tail rabbit and common box turtle) with classification rates above 75% (both above 90%). Bone-specific DFAs performed better, with overall accuracies ranging from 78 to 89%. The humerus DFA had the most accurate classifications with 18 species above 90% and none below 50%. The humerus DFA performed the worst for brown bear (55.6%), domestic dog (53.7%), and pig (50.0%). Domestic dog had classification issues across all DFAs given the high degree of variation in dog sizes and morphologies. Species within the same genus were commonly misclassified (e.g., domestic dogs and coyotes, brown bears and black bears, etc.), given their similarity in morphology and substantial overlap in body size. Human classification rates for the bone-specific DFAs ranged from 76.8% (ulna) to 100.0% (humerus, femur, and radius). All stepwise DFAs retained all variables in the final functions, and maximum length was consistently the most important variable. Ultimately, while the overall species classification rates for the bone-specific DFAs are acceptable, results varied greatly by taxa, suggesting that the DFAs should only be used as a general guide and should not be relied on as final determinants of species identification.







**Figure 1.** Decision tree developed to classify human (Hum) versus nonhuman (Non) elements from a pooled-bone sample (i.e., all long bones pooled). Working from the top of the tree, the variable listed at each level would be measured, and based on the provided sectioning point, the user would move down the tree to the next level. This process would continue until arriving at a terminal node where classification would be assigned. Terminal nodes are outlined in red. Group classification is highlighted in yellow and bolded at each node. Percentages and counts of bones classified to each group in the training and testing samples are presented, as well as the total percentage of the sample represented in that node. Overall correct classification for the test sample is 91.0% (99.6% for human and 75.0% for nonhuman elements). This decision tree corresponds with the first line in Table 4. **Figure 1.** Decision tree developed to classify human (Hum) versus nonhuman (Non) elements from a pooled-bone sample (i.e., all long bones pooled). Working from the top of the tree, the variable listed at each level would be measured, and based on the provided sectioning point, the user would move down the tree to the next level. This process would continue until arriving at a terminal node where classification would be assigned. Terminal nodes are outlined in red. Group classification is highlighted in yellow and bolded at each node. Percentages and counts of bones classified to each group in the training and testing samples are presented, as well as the total percentage of the sample represented in that node. Overall correct classification for the test sample is 91.0% (99.6% for human and 75.0% for nonhuman elements). This decision tree corresponds with the first line in Table 4.

Correct species classification rates from the stepwise DFAs are summarized in Table 5. The pooled-bone analysis had an overall 40.4% accuracy rate, which, although better than the a priori classification rate (3.6%), can lead to numerus classification issues. For this model, 20 species had correct classification rates below 50%, with only two species (eastern cotton-tail rabbit and common box turtle) with classification rates above 75% As might be expected, the decision tree results attempting to classify species were not successful. While tree overall classification rates were over 70% for all analyses except the ulna, none of the trees produced 28 terminal nodes to classify each species. To classify each species would require too many levels and branches; thus, the trees opted for preserving overall classification rates by focusing on those species with the highest counts.

(both above 90%). Bone-specific DFAs performed better, with overall accuracies ranging

*3.3. Morphometric Skeletal Species Identification* 


**Table 5.** Stepwise DFA species classification results. The right side of the table presents the number of species that fell within each accuracy range (i.e., <50%, 50–75%, 75–90%, or >90%). Vars = variables included in final function, Acc = Accuracy, Hum = human.

#### *3.4. Web Tool for Species Identification*

Both the DFA and decision tree results suggest that a simple equation or tree cannot be used to adequately identify skeletal species. When forensic anthropologists visually evaluate skeletal remains, they mentally process the bone dimensions to consider possible species, using the overall bone size and shape to narrow down potential species. Ultimately, however, visual comparisons and specific bony features are used to make final species identifications.

To facilitate this species identification process, we utilized the metric data and images from our study sample to develop an online, freely available species identification tool: OsteoID [43]. The home page asks users to first identify the bone, providing diverse exemplars for each element (humerus, femur, radius, radio-ulna, ulna, tibia, fibula, metapodials, scapula, sacrum, and os coxae), demonstrating the common general morphology of specific elements across most species. There is also an option to "Search All" if the user cannot confidently determine bone type. Once an option is selected, the user is brought to a new page where they can narrow the search by common name, scientific name, or by bone length, proximal width, and distal width. At any point, the user can search additional fields in the side bar.

Maximum length, proximal width and distal width were chosen as the web tool filtering variables for several reasons. First, they were found to be the easiest to measure reliably, even with little or no osteological experience. In addition, the DFA and decision tree analyses revealed maximum length to be the most important variable in species identification, followed commonly by maximum distal width; including distal depth did not exclude many more species. Finally, the midshaft measurements are instrumentally defined (i.e., users need to take the maximum length and divide it by two to determine the correct location to take the midshaft maximum and minimum diameters) and require calipers. These factors make application in the field difficult and limit utility to those with osteological backgrounds.

To determine the searchable range for each species/bone measurement, the minimum, maximum, and two standard deviations above and below the mean were calculated. The smallest value (whether two standard deviations below the mean or the observed minimum) was used as the lower search limit, while the largest value (either two standard deviations above the mean or the observed maximum) was used as the upper search limit. This created a conservative size range, which is important given that the dataset does not likely encompass the full size range of each species. For elements in the database missing one or more measurements, a range of 0–1000 mm was assigned so that it would not be automatically eliminated during searches.

As possible bones/species are narrowed, thumbnails show multi-views of the bones by species as well as a list of the possible measurement ranges. Clicking on the thumbnails opens a larger image in a new window. By opening in a new window, multiple possible matches can be opened and placed side-by-side if needed. Most figures have six views of the exemplar element (anterior, posterior, medial, lateral, proximal, and distal) and include the maximum length range on the image, a scale bar, and, when possible, a penny was added for more intuitive sizing. Genus, species, collection, bone, and side information is also provided. Some images have been annotated to point out distinctive features. The user ultimately makes their final species classification based on visual comparisons. This web tool is also compatible for use on smartphones and thus is accessible in the field.

Informational tabs on the home screen describe the web tool and its development, provide instructions on utilizing the web tool (including measurement images), and answer frequently asked questions. Users are reminded that filtering the bones/species by measurements only works for skeletally mature specimens and are instructed on how to identify skeletal maturity. In numerus places, users are reminded that if a bone has any possibility of being human, they need to contact the local law enforcement agency immediately.

Finally, a tab also refers the user to additional resources [43]. This includes references to other texts or websites as well as a link to a Dropbox folder where they can find additional project resources. In this folder, users can find the images included in the web tool, as well as images of other elements such as carpals and tarsals, which were not included in the main web tool given that measurements were not collected from these elements. Three-dimensional surface scans of many of the elements are also provided, which can be downloaded by users to view for comparison or 3D print. These 3D prints may be used to build or supplement comparative osteology collections. We are continuously expanding these Supplementary Materials and uploading them to additional digital repositories (e.g., [44]). Finally, the project data can also be accessed in this folder, as well as on Dryad [45].

#### **4. Discussion**
