Can Taxonomists Think? Reversing the AI Equation

Valdecasas, Antonio G.

doi:10.3390/taxonomy4040037

Open AccessEssay

Can Taxonomists Think? Reversing the AI Equation

by

Antonio G. Valdecasas

Museo Nacional de Ciencias Naturales, CSIC, c/José Gutiérrez Abascal, 2, 28006 Madrid, Spain

Taxonomy 2024, 4(4), 713-722; https://doi.org/10.3390/taxonomy4040037

Submission received: 19 July 2024 / Revised: 23 September 2024 / Accepted: 1 October 2024 / Published: 2 October 2024

Download Versions Notes

Abstract

Confusion between the means and ends, specifically between technological achievements and their users, has been evident in taxonomy’s history since the end of the last century. Following a current of thought implicit in Anglo-Saxon culture, this trend aligns with the idea of inevitability. It is inevitable, so it is thought, that what a human organism can do, a machine will be able to do at some point in time. This will ultimately lead to dispensing with the human element for things they do not wish to do themselves. Despite certain misunderstandings about what has become known as the Turing Test, the general idea is to determine whether a machine can analyze data as meaningfully as a human does and make decisions based on that analysis. In the case of taxonomy, the initial aim of using machines was to efficiently replace a researcher for identification purposes. The situation later evolved to include the discovery of new entities in addition to identification. In this essay, I provide a brief overview of some milestones along this trajectory and its current state and discuss the influence of artificial intelligence (AI) in taxonomy.

Keywords:

taxonomist; species; methodology; artificial intelligence

There is only one systematics
Goloboff, 2022

1. The Taxonomic Context

In a scarcely cited essay, evolutionary biologist Richard Lewontin [1] describes taxonomic space, a concept he attributed to G. F. Hutchinson, as a multidimensional space that includes not only present and past species but also those yet to emerge and those that could not exist. For an evolutionary biologist, analyzing phenotypic trajectories in this space offers an interesting point of reflection on what can and cannot happen in evolution, and the constraints imposed on certain trajectories, which can prohibit them from proceeding among all possible ones.

From an empirical standpoint, the estimated number of living species ranges between around 9 million [2] and several trillion [3] when considering bacteria, fungi, and protists [4]. To this, we must add the extinct fauna [5], which some estimate to be about 2 billion species [6]. In stark contrast to these estimates, currently, only around 2 million species are thought to be described [7], a very limited achievement for nearly 300 years of research.

Within this universe of species, we should also consider the abundance space of each, which is partially reflected in the abundance of specimens in any taxonomic sampling. From classic analyses by Fisher, Corbet, and Williams [8] and Preston [9], we know that in any faunistic sampling, species abundance decreases as the number of specimens increases, such that species with one specimen are more abundant than those with two specimens, and so on [10]. This pattern has given rise to much literature that is especially relevant for ecologists. For taxonomists, it simply conditions the nature of the evidence that can be recovered from organisms scarcely represented in a sampling. This point assumes that a complete organism is available. However, as in paleontology, it is common to find only parts of specimens. It is within this complex field of abundance and limitations that taxonomists have been working since Linnaeus laid the foundations for an orderly record of terrestrial biodiversity.

From a methodological standpoint, the initial basis of taxonomic work has revolved around the basic concept of sameness [11,12], meaning identity [13]. When applied to identification, one of the taxonomist’s two main activities, this concept can be equated to the execution of matching algorithms. For example, identifying specimens is comparable to running algorithms to make matches in dating apps, especially when the dimensions used for matching are detailed among different specimens, whether it be morphological (e.g., appearance, height, eye color in the case of humans or diagnostic morphological characters in other species) or behavioral (e.g., cultural interests or song or feeding patterns).

The taxonomist’s other main activity, the discovery of new species, results from the negative outcome of the previous task and involves the analytical description of morphological, cytogenetic, behavioral, and/or molecular characteristics to support the inference of a new class of organism (species). In a classic guide on North American Drosophila published in 1921, A.H. Sturtevant [14], author of the first chromosomal map, develops what could be called the ‘model of a taxonomic guide’ for biologists of various kinds. In it, he analyzes the behavior, genetics, physiology, and other aspects (nowadays referred to as ‘integrative’) of this group of Diptera, preceding an identification key and corresponding description of each species studied. Notably, some of the characteristics Sturtevant studied in detail, such as the arrangement and presence/absence of bristles on different parts of the body, knowledge essential for certain genetic studies, are often undervalued by non-taxonomists who view them as arbitrary and subjective decisions about insignificant characters for the recognition or naming of new species.

Taxonomists do other things but suffice it to say that naming establishes a relationship of kinship [15]. In this sense, I use taxonomy and systematics interchangeably for the discipline that discovers and organizes biodiversity by descent.

The entity considered above the level of the organism, traditionally referred to as the species, is widely accepted as a functional unit of intra-descent from both creationist and purely scientific standpoints. Since the time of Darwin and Wallace [16], it has also been regarded as the unit of relationship among species.

Given the enormous number of organisms yet to be discovered and characterized, various lines of assistance in the recognition and discovery of species have been developed over the course of taxonomy’s history, especially since the advent of computers. Here, I refer to these lines collectively as the methodological trajectory. The other aspect, the conceptual trajectory, pertains to the theoretical developments in the field in which the discovery of new organisms and their kinships are framed, and the concept of species and phylogenetic reconstruction are key matters.

Despite certain misunderstandings about what has become known as the Turing Test (see [17] for a detailed discussion), the general idea is to probe whether a machine can analyze data as a human can and make the same decision as a human (or possibly a better one) based on that analysis. This implies a role for meaning [18]; however, this concept will not be dealt with in detail here, although the critical assessment of original data will be. In the case of taxonomy, the initial aim was to use machines to efficiently and accurately make identifications, bypassing the need for an expert researcher. Later, the aim evolved to include the discovery of new entities. In this essay, I provide a brief overview of milestones in the methodological trajectory, and the effect of AI, in taxonomy over the last 60 years and its current state. Although I focus on zoological organisms, a similar scheme should be applicable to other organismal kingdoms. This overview does not aim to be an exhaustive history. Rather, it provides some examples of a conceptual trajectory and procedures made possible following the advent of computers, with special reference to open data sources. A more detailed discussion of the taxonomic cycle (do not confuse with the ‘taxon cycle’) [19] has recently been carried out by Colin Favret [20].

2. The Taxonomic Tools

2.1. Compiling and Organizing Information

For taxonomic studies, the first step is to gather all known information on the group under study by compiling species and bibliography lists. An early example of species lists for a specific region was the Rubin Code System for Scandinavia, a system now forgotten but very novel in its time. Examples include its Marine Benthic Algae [21] and Land Molluscs in the North [22] lists. Since then, species lists for different regions and habitats have been generated by a number of institutions, such as the National Oceanographic Data Center (NODC), whose publications feature a taxonomic code. Following the Linnaean hierarchy and associating a code with each taxon has allowed for the retrieval of various types of biological information in computerized systems (e.g., [23]).

Literature searches to generate bibliographies have been facilitated by the successive publication of the Zoological Record (ZR), particularly when it became accessible online, as well as websites such as Google Scholar. A single search in the ZR can yield hundreds of references that can be exported to a reference manager (Mendeley and Zotero are two examples of open-source reference managers that can import ZR information in RIS format). Nowadays, species lists of specific clades, and information on their distribution and references, can also be found on websites dedicated to those clades. For example, the European Water Mite Research website (watermite.org last accesed on 30 September 2024) not only includes the species list of aquatic mites in Europe and related regions but also the complete bibliography of this group.

A second step is to extract information from taxonomic publications and structure it to facilitate the identification (or not) of an organism under study. In the methodological trajectory, the objective has been to replicate on computers what was previously done manually. Since visual information is very important in taxonomic work, some initiatives in the pre-computer era involved creating iconographic databases, such as the Fritsch Collection of Freshwater Algae [24]. Although digitization of this collection began in 2012 while it was housed at the Freshwater Biological Association (FBA), this plan was never completed and the collection was transferred to the National History Museum in London. AlgaeBase (www.algaebase.org last accesed on 30 September 2024) [25] is an expanded and modern version of the Fritsch Collection. Terry Gledhill compiled a pre-computer era iconographic database for aquatic mites worldwide, with more than 5000 species [26] at the FBA. FishBase (www.fishbase.org last accesed on 30 September 2024) is another example of a current and highly used online database with a specific taxonomic focus.

A database with a more broad and complex goal than the aforementioned databases is the PLAZI project (www.plazi.org last accesed on 30 September 2024). According to their mission statement, “Plazi is a non-profit organization founded in 2008 to promote the free accessibility of scientific data, in particular taxonomic treatments and images”. Plazi provides textual and iconographic taxonomic information for many clades.

New collection management systems that connect various web-based databases are also in development. A good example of one of these systems is Arctos (A Collaborative Collection Management Solution; www.arctosdb.org last accesed on 30 September 2024). Arctos connects queries with iSpecies, Wikipedia, Animal Diversity Web, and NCBI, among other databases.

In terms of structuring information, a common tool for taxonomists is identification keys, particularly dichotomous ones. Previously only available in printed texts, identification keys quickly became incorporated into automated systems, such as DELTA (www.delta-intkey.com last accesed on 30 September 2024), once personal computers became widespread. The DELTA system, which was developed by Mike Dallwitz [27], evolved over time to allow, with structured data input, the generation of dichotomous keys, natural language descriptions, and the transformation of data into matrices that could serve as input for numerical or cladistic analyses.

The ever-increasing use of molecular biology approaches in taxonomic work has expanded the field’s horizon. Molecular data is a new source of information that can, in principle, be more objective, quantifiable, and more quickly processed than traditional morphological descriptions. I qualify this statement because the chromatograms and final products of molecular processing are not always archived, making some deposited sequences unverifiable. In taxonomy, the storage of a voucher, usually the original specimens on which a species is described, in a public institution is a common requirement. In the case of molecular data, chromatograms correspond to vouchers, and they can be re-evaluated to certify whether a sequence has been read and reported correctly [28]. Molecular databases, such as GenBank (www.ncbi.nlm.nih.gov/genbank last accesed on 30 September 2024) and BOLD SYSTEMS (www.boldsystems.org last accesed on 30 September 2024), are commonly used in taxonomy today.

Other important tools related to taxonomy, although perhaps not directly, include museum collections management and databases of taxonomists. In the first example, significant technological and methodological developments have occurred over the last 40 years to improve the infrastructure around taxonomic work. In the second example, before the era of Google, databases of taxonomists [29,30] were useful for helping colleagues find each other and share information. In this essay, these are extrinsic tools that will not be further discussed here.

2.2. The Overarching Umbrella: The Species Concept

Since Plato and Aristotle (and presumably earlier), humans have been grouping organisms into kinds, or species [31]. However, the concept of species varies greatly among specialists, and it does not seem that this diversity of concepts will substantially decrease in the near future. Since 1 January 1758, considered the starting point of the nomenclature for the animal kingdom, the literature on the concept has continued to grow. According to a recent count, around 32 species concepts have been described [32], with the illusory effect that new proposals would displace previous ones. The most recent proposal, at least at the time of this writing, involves the use of AI to resolve the species concept in a way that humans have not been able to [33]. This is not the place to delve into the weaknesses and strengths of this new proposal. Only time will tell if this supposedly definitive solution will be able to encompass the variety of processes that have produced the enormous biodiversity we know, while also building a useful structure for the common tasks performed by researchers, as the Linnaean classification system has done in correspondence with the relationship of descent.

2.3. Tools for the Analysis of Data

In the 1960s, two schools of thought associated with the analysis of taxonomic data emerged: phenetics and cladistics. Phenetics, or numerical taxonomy, aspiring to the building of robust classifications based on global similarity between taxa, first took root following the publication of the Principles of Numerical Taxonomy in 1963 [34] and its updated and expanded edition ten years later [35]. The methodology presented in these two numerical taxonomy texts could be easily implemented using James Rohlf’s computer software package NTSYS [36]. Phenetics has evolved into the field of morphometrics, specifically geometric morphometrics. Two recent articles by Italian researcher Andrea Cardini [37,38] serve as an introduction to a geometric morphometric methodology specific to taxonomists. In this approach, the morphological analysis is carried out on a configuration of landmarks—homologous locations—and the main requirement is that the number of specimens exceeds the number of landmarks used in the study—one of the main limitations faced by taxonomists, as mentioned above on species abundance.

Cladistics, classifications based on hypothesized shared ancestors, emerged following the publication of the translation of Willi Hennig’s Phylogenetic Systematics [39], and in this case, the Hennig86 computer program developed by James Farris [40] was used to implement Hennig’s principles. The most current incarnation of a parsimony-based program is TNT (Tree Analysis using New Technology), developed by Argentine researcher Pablo Goloboff and colleagues James Farris and Kevin Nixon. Convergence between the two disparate schools of thought can be found in recent publications by Goloboff [41,42]. In these, in addition to traditional morphological analyses, the principles of cladistics are applied to morphometric analyses in the search for parsimonious phylogenies.

Another aspect related to the analysis of taxonomic data today is image processing. Advances in imaging, mainly due to its immense applications in industrial processes, have changed the field of taxonomy. However, its application is largely constrained to certain organismal groups (see next section) [43,44].

The increased use of molecular data in taxonomy, as discussed above, has also brought about a flourishing of analytical and representation methodologies unprecedented in the field. The primary methodological development has been the application of coalescent theory to multiple species [45], where the genealogy of genes in a sample is modeled by tracing back the ancestral relationships of those genes until the most recent common ancestor (MRCA) is reached [46].

3. Some Taxonomic Caveats and Limitations

On the road to the liberation of human activity in the identification of organisms, substantial progress has clearly been made over the last 60 years, particularly with respect to the automatization of activities, but, above all, information accessibility. Some of the imagined possibilities met with such difficulties that only a certain fantasy can solve them at present.

3.1. Taxa Abundance

It is not true that multiple surveys will make rare species abundant in each sampling campaign [47]. Species in a clade are represented by a decreasing number of specimens as the number of organisms counted increases [48]. This imposes limitations on the methodology that can be used with them. With rare species, an assumption is that they are, to some extent, representative of their kind; otherwise, one should give up working with them. Similar assumptions are made across all branches of science, and with further knowledge, they can be confirmed or refuted. This is similar to when one has fragments of an organism. In both cases, the solution is to collect more empirical data from the sample using various methods.

3.2. Taxa Size

The study of biodiversity is largely skewed toward the largest and most conspicuous animals of each clade, which also tend to be the ones that have been studied before. For organisms smaller than 1 mm, we still do not have machines that are capable of dissecting and orienting the different parts of the organisms so that they can be properly identified. Establishing morphological correlates that can obviate dissection is one of the challenges to be met in the near future. The use of advanced microscopic techniques, such as confocal microscopy, and generalist quantification methods, such as Elliptic Fourier Analysis, are potential approaches that can be further developed for the study of small-sized and microscopic animals.

3.3. Taxa Accessibility

Similarly, the most studied organisms in a given clade are those found in habitats most accessible (geographically or physically) to humans. Rare or difficult-to-access habitats have received less attention. An exemplary case is the discovery of the rich interstitial river fauna by Stanko Karaman in the 1930s [49,50]. Despite successive attempts, there is no procedure for separating the organisms from the sediment in these samples that is equivalent to that used for soil samples.

Yet, if it is certain that all these difficulties will be solved in the future, then it is time to preserve not only the vouchers of organisms, but also the river and marine sediment samples that many museums have not had the time or the personnel to sort. As in the case of samples from Malaise traps, perhaps they can be sorted by machines in the not-too-distant future [47].

An additional aspect not addressed in this essay is the matter of funding for taxonomic activities, a key issue discussed by Hines in his book Systematics as Cyberscience [51], and on which the very existence of taxonomists obviously depends.

4. Taxonomy and Artificial Intelligence

The title of the article refers to the classic question of whether a machine can think. By reversing the question and assuming that the machine can sequentially perform all the tasks previously carried out by a taxonomist, a new question arises: will there be anything left for a specialist to reflect on and contribute regarding the taxonomy of their group?

4.1. On Artificial Intelligence

Maybe the best summary of the computer branch known as AI is contained within the definition with which Stuart Russel and Peter Norvig open their essential treatise on AI [52].

In a sort of Cartesian plane, where “human vs. action” defines one axis or dimension and “thought vs. behavior” defines the other, Russell and Norvig identify four combinations or quadrants: acting humanly, thinking humanly, acting rationally, and thinking rationally. Each combination encompasses different aspects of computing and processing. For example, natural language processing, knowledge representation, automated reasoning, and machine learning fall under thinking humanly. If interaction with objects is considered, computer vision and robotics should also be included. The other combinations delineate other areas of computing.

To get an idea of the developments currently categorized under the umbrella of AI, one can refer to the AI Index (aiindex.org), which Stanford University has been producing since 2017 [53]. In their own words “The AI Index report tracks, collates, distills, and visualizes data related to artificial intelligence (AI). Our mission is to provide unbiased, rigorously vetted, broadly sourced data in order for policymakers, researchers, executives, journalists, and the general public to develop a more thorough and nuanced understanding of the complex field of AI”. The 2024 edition includes, for the first time, a chapter dedicated to AI in Science and Medicine.

For our purpose in this essay, we can summarize the relationship between AI and biological taxonomy by adapting the dictum, “AI has focused on the study and construction of agents that do the right thing” [52]. In our case, the “right thing” refers to the goals or objectives that we, as taxonomists, provide to the agent (computer and/or robot) [20].

4.2. The Potential Effect of AI in Taxonomy

Many of the activities mentioned in this essay could be greatly facilitated by large language models, and generally by artificial intelligence (AI) systems. Two important problems must be overcome for the more advanced AI systems to be useful in taxonomy:

(a): A large part of the academic literature related to taxonomy, for example, the bibliographic collection of the ZR, is not freely accessible. The Biodiversity Heritage Library (BHL) only has access to the literature up to 1922, and the loss of copyright is one year for every other that passes. Even today, much of what is published is only accessible by subscription. These represent significant handicaps for accessibility.
(b): A more fundamental problem is that AI can work as an aggregator of information and structure it in its output, but lacks the capacity for specific analysis unless it is instructed on the type of analysis needed to answer certain types of questions with certain types of data. This is akin to asking AI to behave like users of programs who lack a deep understanding of the what and the how and who, as a result, limit themselves to following computer script examples as if they were the surefire way to obtain any type of answer on the assumption it will be a reasonable or correct one.

It does not seem that either of these two limitations cannot be overcome in the near future. And what were once considered errors have now become the seeds of future successes, once they were identified and corrected (see, for example, the case of the misidentification of the tench https://www.aiweirdness.com/when-data-is-messy-20-07-03/ last accesed on 30 September 2024).

We suggest below three tasks, from the many, that could soon be solved by an AI dedicated to taxonomy, which would largely facilitate taxonomic work.

4.3. Drawing by Example

One task that is often considerably tedious is transferring the features of an organism that are considered diagnostic into a drawing. In this process, a certain realism is sought within a simplification of the actual image. Precisely, the persistence of drawings in taxonomic work during an era dominated by digital photography lies in their ability to simplify an image to highlight those characters that diagnose a particular taxon (a new species, for example).

The idea of “drawing by example” is the opposite of what many current AI systems offer, where a sketch is provided, and they produce a realistic image in the desired style. In the case of taxonomy, the system would be fed with typical drawings of diagnostic structures of a certain type of organism, and then it would be asked to produce an equivalent drawing using the realistic image of that structure in a new specimen from which we want to extract the diagnostic characters.

4.4. Morphometric Measurements

Derived from the previous process and to overcome the time-consuming task of recording morphometric measurements of different structures, a taxonomy-dedicated AI could obtain standard morphometric data from numerous specimens in a fraction of the time it would take a taxonomist, facilitating the conversion of those measurements into formats suitable for analysis that could relate them to functional or structural hypotheses of the organism in question.

4.5. Cross-Vocabulary and the Production of Morphological and Anatomical Ontologies for Different Clades

Since the time of Linnaeus, taxonomic literature has been written in many different languages. The vast potential of AI-powered semantic translation systems opens up a new direction for taxonomic publications. Specialists whose native language is not English will be able to write and publish their work in their own language with the confidence that, at the click of a button, interested readers can access the text in the language of their choice. The reverse is also true: reading from any language will be just as seamless. In this way, taxonomists will no longer have to sacrifice the richness of vocabulary and expression in their native language to convey scientific information about the taxon under study. Current translation systems, which in some languages already achieve up to 98% semantic accuracy, are expected to improve further and cover more languages in the near future [53].

Furthermore, the accumulated knowledge of a clade will be conceptually organized into ontologies—‘the study of what there is’ [54] for morphological and anatomical features, which are still in an early stage of development for the majority of clades [55,56].

5. Conclusions

One gets the feeling that, in the effort to facilitate taxonomists’ work, there has been an attempt, in many cases, to remove taxonomists themselves from the picture. This is not surprising given that the survival of basic research depends on public funding, which is always scarce and highly competitive.

Regarding the taxonomic work itself, the greatest impact on its activity has come from conceptual advances (the world of ideas), which has not always been an easy task. One need only recall the stubbornness of an icon in evolutionary biology from the last century, Ernst Mayr, who continued to defend an organization of life based on paraphyletic taxa in a discussion about the number of kingdoms in the living world 32 years after the translation of Hennig’s Phylogenetic Systematics was published (though he likely read its first German edition in the 1950s) [57,58].

The world of taxonomists has been, and still is, a world of ideas about life on Earth that aims to provide coherence and a rational understanding of the natural world in the form of a hierarchy according to the theory of descent. Let us not forget that Darwin dedicated five years of his life to classifying living and fossil cirripedes, and that Wallace was a seasoned taxonomist. From its beginning, taxonomic activity has been neither whimsical nor subjective, as evidenced by the evolutionary pattern implicit in the classifications of Linnaeus and other contemporaries (who were not evolutionists) [59]. This pattern is not present for organisms that are artificially constructed [60] under supposed evolutionary constraints used to test classification methodologies.

However, the path to new ideas should be guided by the existence of empirical data (although the opposite is also frequently true). It would be interesting if initiatives like the Senckenberg Ocean Species Alliance [61], which aims to accelerate marine invertebrate taxonomy, could be expanded to other environments. In this regard, AI has the potential to significantly ease the workload of taxonomists by freeing them from the more routine tasks, allowing them to focus on what George Estabrook emphasized: “Taxonomists will share the blame if they do not …participate in the practice of scientific method and use the results of their craft to test and argue the differential credibility of hypotheses to explain pattern, process, adaptation, mechanism, geographic distribution, history, etc.” [62]. Activities that could fall under the title Fox Keller used for her biography of Barbara McClintock [63] (but see also [64]) one could say that taxonomists have a “feeling for the clade” and, in many cases, intuition. At present, AI lacks both, and that is why we still need taxonomists.

Funding

This publication was supported by project PID2020-116115GB-100 from the Ministry of Science and Innovation of Spain.

Acknowledgments

Melinda Modrell improved the English of this essay.

Conflicts of Interest

The author declares no conflicts of interest.

References

Lewontin, R.C. Four complications in understanding the evolutionary process. Santa Fe Inst. Bull. 2003, 18, 17–23. [Google Scholar]
Mora, C.; Tittensor, D.P.; Adl, S.; Simpson, A.G.; Worm, B. How many species are there on Earth and in the ocean? PLoS Biol. 2011, 9, e1001127. [Google Scholar] [CrossRef] [PubMed]
Locey, K.J.; Lennon, J.T. Scaling laws predict global microbial diversity. Proc. Natl. Acad. Sci. USA 2016, 113, 5970–5975. [Google Scholar] [CrossRef] [PubMed]
Wiens, J.J. How many species are there on Earth? Progress and problems. PLoS Biol. 2023, 21, e3002388. [Google Scholar] [CrossRef]
Jablonski, D. Extinction: Past and present. Nature 2004, 427, 589. [Google Scholar] [CrossRef]
Raup, D.M. Extinction. Bad Genes or Bad Luck? W. W. Norton: New York, NY, USA, 1991; p. xvii + 210. [Google Scholar]
Chapman, A.D. Numbers of Living Species in Australia and the World, 2nd ed.; Report for the Australian Biological Resources Study; Australian Government, Department of the Environment, Water, Heritage, and the Arts: Canberra, Australia, 2009.
Fisher, R.A.; Corbet, A.S.; Williams, C.B. The relation between the number of species and the number of individuals in a random sample of an animal population. J. Anim. Ecol. 1943, 12, 42–58. [Google Scholar] [CrossRef]
Preston, F.W. The commonness, and rarity, of species. Ecology 1948, 29, 254–283. [Google Scholar] [CrossRef]
Williams, C.B. Patterns in the Balance of Nature and Related Problems in Quantitative Ecology; Academic Press: London, UK, 1964; p. 324. [Google Scholar]
Van den Berg, R.; Vogel, M.; Josić, K.; Ma, W.J. Optimal inference of sameness. Proc. Natl. Acad. Sci. USA 2012, 109, 3178–3183. [Google Scholar] [CrossRef]
Ma, W.J.; Kording, K.P.; Goldreich, D. Bayesian Models of Perception and Action: An Introduction; MIT Press: Cambridge, MA, USA, 2023; p. 408. [Google Scholar]
Noonan, H.; Curtis, B. Identity. In The Stanford Encyclopedia of Philosophy, Fall 2022 ed.; Zalta, E.N., Nodelman, U., Eds.; Stanford University: Stanford, CA, USA, 2022; Available online: https://plato.stanford.edu/archives/fall2022/entries/identity/ (accessed on 22 September 2024).
Sturtevant, A.H. The North American Species of Drosophila; Carnegie Institution of Washington: Washington, DC, USA, 1921; p. 150. [Google Scholar]
Valdecasas, A.G.; Pelaez, M.L.; Wheeler, Q.D. What’s in a (biological) name? The wrath of Lord Rutherford. Cladistics 2014, 30, 215–223. [Google Scholar] [CrossRef]
Darwin, C.R.; Wallace, A.R. On the tendency of species to form varieties; and on the perpetuation of varieties and species by natural means of selection. J. Proc. Linn. Soc. Lond. Zool. 1858, 3, 45–62. [Google Scholar] [CrossRef]
Oppy, G.; Dowe, D. The Turing Test. In The Stanford Encyclopedia of Philosophy, Winter 2021 ed.; Zalta, E.N., Ed.; Stanford University: Stanford, CA, USA, 2021; Available online: https://plato.stanford.edu/archives/win2021/entries/turing-test/ (accessed on 22 September 2024).
Valdecasas, A.G. Alternative to Turing’s test on AI. J. Brief Ideas 2023. [Google Scholar] [CrossRef]
Wilson, E.O. The nature of the taxon cycle in the Melanesian ant fauna. Am. Nat. 1961, 95, 169–193. [Google Scholar] [CrossRef]
Favret, C. The 5 ‘D’s of Taxonomy: A User’s Guide. Q. Rev. Biol. 2024, 99, 131–156. [Google Scholar] [CrossRef]
Jennebborg, L.-H. Marine Benthic Algae. Code Cent; Swedish Museum of Natural History: Stockholm, Sweden, 1986; pp. P4 0-0–P4 7-2. [Google Scholar]
Waldén, H.W. Landmollusker I Norden. Kodcentralen; Naturhistoriska Riksmuseet: Stockholm, Sweden, 1985; pp. LM 0-0–LM 6-1. [Google Scholar]
NODCTC. Volume 2: Alphabetical (Scientific Name Order) Listing; U.S. Department of Commerce: Washington, DC, USA, 1984; p. 385.
Catwalk Ceramics. Available online: https://www.catwalkceramics.co.uk/download/flyer-2019fritsch.pdf (accessed on 30 September 2024).
Guiry, M.D.; Guiry, G.M.; Morrison, L.; Rindi, F.; Miranda, S.V.; Mathieson, A.C.; Parker, B.C.; Langen, A.; John, D.M.; Bárbara, I.; et al. AlgaeBase: An on-line resource for Algae. Cryptogam. Algol. 2014, 35, 105–115. [Google Scholar] [CrossRef]
Gledhill, T.; Valdecasas, A.G.; Becerra, J.M. A template for the future: Digitizing and databasing a taxonomic illustration collection. Exp. Appl. Acarol. 2007, 41, 109–113. [Google Scholar] [CrossRef]
Dallwitz, M.J. DELTA and Intkey. In Advances in Computer Methods for Systematic Biology: Artificial Intelligence, Databases, Computer Vision; The Johns Hopkins University Press: Baltimore, MD, USA, 1993; pp. 287–296. [Google Scholar]
Peláez, M.L.; Horreo, J.L.; García-Jiménez, R.; Valdecasas, A.G. An evaluation of errors in the mitochondrial COI sequences of Hydrachnidia (Acari, Parasitengona) in public databases. Exp. Appl. Acarol. 2022, 86, 371–384. [Google Scholar] [CrossRef]
Bello, E.; Becerra, J.M.; Valdecasas, A.G. Counting on taxonomy. Nature 1992, 357, 531. [Google Scholar] [CrossRef]
Valdecasas, A.G.; Bello, E.; Becerra, J.M. DIRTAX. Directorio de taxónomos españoles. Graellsia Monogr. 1994, 1, 1–233. [Google Scholar]
Wilkins, J.S. Defining Species: A Source Book from Antiquity to Today; Peter Lang: New York, NY, USA, 2009. [Google Scholar]
Wilkins, J.S. Understanding Species; Cambridge University Press: Cambridge, UK, 2023; p. 160. [Google Scholar]
Karbstein, K.; Kösters, L.; Hodač, L.; Hofmann, M.; Hörandl, E.; Tomasello, S.; Wagner, N.D.; Emerson, B.C.; Albach, D.C.; Scheu, S.; et al. Species delimitation 4.0: Integrative taxonomy meets artificial intelligence. Trends Ecol. Evol. 2024, 39, 771–784. [Google Scholar] [CrossRef]
Sokal, R.R.; Sneath, P.H.A. Principles of Numerical Taxonomy; W. H. Freeman and Company: San Francisco, CA, USA, 1963; p. 359. [Google Scholar]
Sneath, P.H.; Sokal, R.R. Numerical Taxonomy: The Principles and Practice of Numerical Classification; W. H. Freeman and Company: San Francisco, CA, USA, 1973; p. 573. [Google Scholar]
Rohlf, F.J. NTSYS-pc: Microcomputer programs for numerical taxonomy and multivariate analysis. Am. Stat. 1987, 41, 330. [Google Scholar] [CrossRef]
Cardini, A. A practical, step-by-step, guide to taxonomic comparisons using Procrustes geometric morphometrics and user-friendly software (part A): Introduction and preliminary analyses. Eur. J. Taxon. 2024, 934, 1–92. [Google Scholar] [CrossRef]
Cardini, A. A practical, step-by-step, guide to taxonomic comparisons using Procrustes geometric morphometrics and user-friendly software (part B): Group comparisons. Eur. J. Taxon. 2024, 934, 93–186. [Google Scholar] [CrossRef]
Hennig, W. Phylogenetic Systematics; University of Illinois Press: Urbana, IL, USA, 1966; p. 263. [Google Scholar]
Farris, J. Hennig86; Program and documentation, distributed by the author, New York. 1988.
Goloboff, P. Phylogenetic Analysis of Morphological Data, Vol. 1: From Specimens to Optimal Phylogenetic Trees; CRC Press: Boca Raton, FL, USA, 2022; p. 277. [Google Scholar]
Goloboff, P. Phylogenetic Analysis of Morphological Data, Vol. 2: Refining Phylogenetic Analyses; CRC Press: Boca Raton, FL, USA, 2022; p. 291. [Google Scholar]
MacLeod, N.; Benfield, M.; Culverhouse, P. Time to automate identification. Nature 2010, 467, 154–155. [Google Scholar] [CrossRef]
Valdecasas, A.G.; Wheeler, Q. Taxonomy: Add a human touch too. Nature 2010, 467, 788. [Google Scholar] [CrossRef]
Kubatko, L. The Multispecies Coalescent. In Handbook of Statistical Genomics; Balding, D., Moltke, I., Marioni, J., Eds.; Wiley: New York, NY, USA, 2019. [Google Scholar] [CrossRef]
DeSalle, R.; Michael, T.; Jeffrey, R. Phylogenomics: A Primer; CRC Press: Boca Raton, FL, USA, 2020. [Google Scholar]
Meier, R.; Hartop, E.; Pylatiuk, C.; Srivathsan, A. Towards holistic insect monitoring: Species discovery, description, identification and traits for all insects. Philos. Trans. R. Soc. B 2024, 379, 20230120. [Google Scholar] [CrossRef] [PubMed]
Raphael, M.G.; Molina, N. (Eds.) Conservation of Rare or Little-Known Species: Biological, Social, and Economic Considerations; Island Press: Washington, DC, USA, 2013; p. 392. [Google Scholar]
Karaman, S. Die Fauna der unterirdischen Gewässer Jugoslaviens. Int. Ver. Theor. Angew. Limnol. Verh. 1935, 7, 46–73. [Google Scholar] [CrossRef]
Käser, D. A new habitat of subsurface waters: The hyporheic biotope (translation of Orghidan’s 1959 paper). Fundam. Appl. Limnol. 2010, 176, 291–302. [Google Scholar] [CrossRef]
Hine, C. Systematics as Cyberscience: Computers, Change, and Continuity in Science; MIT Press: Cambridge, MA, USA, 2008; p. 307. [Google Scholar]
Russel, S.; Norvig, P. Artificial Intelligence: A Modern Approach, 4th ed.; Pearson Education Limited: Harlow, UK, 2022; p. 1166. [Google Scholar]
Maslej, N.; Fattorini, L.; Perrault, R.; Parli, V.; Reuel, A.; Brynjolfsson, E.; Etchemendy, J.; Ligett, K.; Lyons, T.; Manyika, J.; et al. The AI Index 2024 Annual Report; AI Index Steering Committee, Institute for Human-Centered AI, Stanford University: Stanford, CA, USA, 2024. [Google Scholar]
Schulz, S.; Stenzhorn, H.; Boeker, M. The ontology of biological taxa. Bioinformatics 2008, 24, i313–i321. [Google Scholar] [CrossRef]
Girón, J.C.; Tarasov, S.; González Montaña, L.A.; Matentzoglu, N.; Smith, A.D.; Koch, M.; Boudinot, B.E.; Bouchard, P.; Burks, R.; Vogt, L.; et al. Formalizing Invertebrate Morphological Data: A Descriptive Model for Cuticle-Based Skeletal-Muscular Systems, an Ontology for Insect Anatomy, and their Potential Applications in Biodiversity Research and Informatics. Syst. Biol. 2023, 72, 1084–1100. [Google Scholar] [CrossRef]
González Montaña, L.A.; Rueda-Ramírez, D.; Serna Cardona, F.J.; Gaigl, A. An anatomical ontology for the Class Collembola (Arthropoda: Hexapoda). Braz. Arch. Biol. Technol. 2023, 66, e23220682. [Google Scholar] [CrossRef]
Mayr, E. Two empires or three? Proc. Natl. Acad. Sci. USA 1998, 95, 9720–9723. [Google Scholar] [CrossRef]
Woese, C.R. Default taxonomy: Ernst Mayr’s view of the microbial world. Proc. Natl. Acad. Sci. USA 1998, 95, 11043–11046. [Google Scholar] [CrossRef] [PubMed]
Holman, E.W. Evolutionary and psychological effects in pre-evolutionary classifications. J. Classif. 1985, 2, 29–39. [Google Scholar] [CrossRef]
Holman, E.W. A taxonomic difference between the Caminalcules and real organisms. Syst. Zool. 1986, 35, 259–261. [Google Scholar] [CrossRef]
SOSA; Brandt, A.; Chen, C.; Engel, L.; Esquete, P.; Horton, T.; Jażdżewska, A.M.; Johannsen, N.; Kaiser, S.; Kihara, T.C.; et al. Ocean Species Discoveries 1–12—A primer for accelerating marine invertebrate taxonomy. Biodivers. Data J. 2024, 12, e128431. [Google Scholar] [CrossRef]
Estabrook, G. Book Reviews. J. Classif. 1986, 3, 167–168. [Google Scholar] [CrossRef]
Keller, E.F. A Feeling for the Organism: The Life and Work of Barbara McClintock; W. H. Freeman: New York, NY, USA, 1983. [Google Scholar]
Comfort, N. The Tangled Field: Barbara McClintock’s Search for the Patterns of Genetic Control; Harvard University Press: Cambridge, MA, USA, 2001. [Google Scholar]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Valdecasas, A.G. Can Taxonomists Think? Reversing the AI Equation. Taxonomy 2024, 4, 713-722. https://doi.org/10.3390/taxonomy4040037

AMA Style

Valdecasas AG. Can Taxonomists Think? Reversing the AI Equation. Taxonomy. 2024; 4(4):713-722. https://doi.org/10.3390/taxonomy4040037

Chicago/Turabian Style

Valdecasas, Antonio G. 2024. "Can Taxonomists Think? Reversing the AI Equation" Taxonomy 4, no. 4: 713-722. https://doi.org/10.3390/taxonomy4040037

APA Style

Valdecasas, A. G. (2024). Can Taxonomists Think? Reversing the AI Equation. Taxonomy, 4(4), 713-722. https://doi.org/10.3390/taxonomy4040037

Article Menu

Can Taxonomists Think? Reversing the AI Equation

Abstract

1. The Taxonomic Context

2. The Taxonomic Tools

2.1. Compiling and Organizing Information

2.2. The Overarching Umbrella: The Species Concept

2.3. Tools for the Analysis of Data

3. Some Taxonomic Caveats and Limitations

3.1. Taxa Abundance

3.2. Taxa Size

3.3. Taxa Accessibility

4. Taxonomy and Artificial Intelligence

4.1. On Artificial Intelligence

4.2. The Potential Effect of AI in Taxonomy

4.3. Drawing by Example

4.4. Morphometric Measurements

4.5. Cross-Vocabulary and the Production of Morphological and Anatomical Ontologies for Different Clades

5. Conclusions

Funding

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI