*4.1. Use of Archival Specimens*

One of the most important results of our study is the finding that over 75% of the historical and nearly 94% of the fresh collected lichen samples yielded sequences with the Illumina platform. Using Sanger sequencing, far lower success rates were achieved for both historical (19%) and fresh (58%) collections. The reasons for the comparatively low success rate using Sanger sequencing on the newly collected material are unknown; in previous attempts to sequence fresh material from these genera, we typically achieved a success rate of 70–90% using Sanger technology. Initial preservation of the material also poses challenges on recent collections, especially if specimens cannot be properly dried when in the field for more than one day and carefully curated shortly afterwards. Nonetheless, the highly successful Illumina sequencing of these added fresh specimens indicates that DNA degradation had not progressed too far and even when these exact samples represented more difficulties with Sanger, Illumina sequencing worked satisfactorily. Regarding historical collections, similar or even higher success rates as in our study have been obtained in other recent studies using HTS to obtain DNA sequences from historical lichen collections [19–21], indicating that archival specimens available in herbaria and fungaria around the world may potentially yield usable sequences if the proper DNA extraction and sequencing approaches are taken. This includes many rare taxa, potentially extinct taxa that are no longer found in nature, and other taxa of unique value for which it is difficult to gather fresh material. The fact that so few published sequences currently exist for historical lichen collections is, thus, likely a consequence of the only recently growing awareness of the potential of advanced molecular sequencing methods to unlock these resources [12,13].

It should not be surprising that Illumina HTS yielded significantly better results than Sanger sequencing for archival specimens. DNA fragmentation in ancient herbarium samples is a well-documented phenomenon [1,78], including in lichen-forming fungi [16,18–20,79,80]. Considering this, and other postmortem damage known to take place in collected specimens [16,80], the success rate for the samples studied here is remarkably high, which may, in part, be explained by the generally higher sequencing success for Basidiomycota among fungal collections [24,25].

The addition of 318 new sequences from historical collections of *Cora* and *Corella*, including those of *Cora timucua* [23], makes this group one of the best represented among lichens with regard to historical sequences. With 92% of the 2988 GBIF occurrences known for this group (as of 31 January 2022) representing preserved samples (i.e., with a voucher deposited somewhere, not based on observations only), it is clear that historical collections are an invaluable resource that can be used in an integrative framework for describing and detecting new species and inferring relationships among them.

Our results demonstrate that the age of a specimen has some effect on the sequencing success rate, especially with regard to Sanger sequencing. Substrate type may also affect the success rate. However, it appears that other factors may play an important role in the degradation of DNA, such as the techniques employed at the time of collection to dry and preserve the specimens. Since these methods are usually not indicated on the specimen label, it is impossible to discern their potential impact from the role of age or substrate type as a determinant of sequencing success. Given the relatively good success rate we achieved, the fact that age is not a main determinant of success is notable. One important factor to consider is poikilohydry, which plays a major role in diurnal metabolism of lichens and directly relates to mechanisms protecting the DNA [81,82]. Perhaps lichens that undergo pronounced and/or prolonged dry periods maintain effective DNA protection mechanisms, whereas in lichen growing under frequently or permanently humid conditions, the DNA may be less protected from desiccation, a hypothesis already considered by Kistenich et al. [19]. One may, for instance, expect that species growing under more extreme water stress conditions, such as in southern South America or in the high Andes above 4000 m, would show better sequencing success even in archival specimens. However, without systematic comparison across different lichen taxa and a variety of habitats, this remains speculation, and how this would translate into sequencing success rates in *Cora* is unclear.

An important challenge to sequencing old samples is contamination, stemming from three potential sources: (1) fungi of the microbiome already present in the sample when collected; (2) fungal contaminants emerging due to specimen handling and preservation (e.g., molds); and (3) laboratory contaminants, which are particularly an issue with the highly sensitive HTS approaches. Although the first two potential sources of contamination in archival specimens are beyond the control of the investigator, laboratory contamination can be avoided or reduced to a minimum by applying recommended best practices, such as: (1) avoiding plate extractions and using individual tubes instead; (2) using extreme care when handling specimens and extracts (e.g., wearing gloves, sterilizing all equipment, especially forceps, etc.); (3) extracting DNA under sterile conditions, such as those found on ancient DNA laboratories; and (4) avoid working simultaneously with fresh and historical materials (e.g., in the same sequencing run), since more recently collected samples will tend to dominate a run, even with careful normalization of PCR input as performed here. In any case, potential contamination can be assessed posteriori by analyzing the taxonomic composition of fungal reads in a given sample.

In addition to the target mycobionts, we detected multiple other fungal taxa in our HTS samples. Although some of these, such as *Aspergillus* or *Penicillium*, may represent post-sampling contaminants, many others are frequent, opportunistic, or stable residents of the lichen mycobiome [34,83], including in *Cora* [84,85]. A grea<sup>t</sup> deal of evidence indicates the presence of obligately lichenicolous and endolichenic fungi and/or cortical yeasts in lichens [86–96]. With respect to known fungal groups previously found in lichens, in our material we detected ASVs belonging to members of the orders Cystofilobasidiales, Filobasidiales Tremellales, and Trichosporonales, all within the class Tremellomycetes, in 24% of the fresh samples and 22% of the historical samples (and in none of the negative controls) with varying quantity of reads. More than 21 distinct genera were detected within this class, with the most commonly observed genus being *Hannaella*, a basidiomycetous yeas<sup>t</sup> genus found widely on leaf surfaces of various plants [97]. Even though the presence of fungi may influence humidity and ionic regimen on a thallus surface and subsequently transcriptomic response, without further data and knowledge of the development of these communities of species over time and their role in the symbioses, it is impossible to assign much significance to it at this point.

Presumptive laboratory contaminants were also sometimes observed in some samples. Generally, these were present in very low frequencies (e.g., fewer than 100 reads while the target mycobiont had 30,000 reads) and were not consistent with the inferred ecogeography of the corresponding species, which allowed their recognition and removal. If a sample only showed rare reads (less than 100) and no prevalent ASVs were present, the sample was considered unsuccessfully sequenced and was not included in the downstream analyses.

### *4.2. Assessment of ITS as a Barcoding Marker and Intragenomic Variation within ITS*

Given the reported issues with the use of ITS as a fungal barcoding marker [34–37], and to address the possible argumen<sup>t</sup> that the observed phylogenetic diversity in *Cora* and *Corella* may in part be artifactual, we paid special attention to intragenomic variation in the ITS barcoding marker as evidenced by variation in the ASVs and ambiguities in Sanger sequences among our studied samples. Our expectation that potential ambiguities in Sanger sequences, mostly representing double peaks in the sequencing chromatograms, would match dominant and consistent SNPs in the corresponding Illumina ASVs was supported by the data, which allowed us to quantify this phenomenon reliably and in detail.

Since Illumina sequencing did not allow for amplification of the full ITS region, our comparisons were limited to samples for which we successfully sequenced the ITS1 region using both Sanger and Illumina for the same sample. For the 290 ASVs detected in these 75 samples, we detected no variation among the Sanger and the Illumina sequences (30%), one singleton (61%), two singletons (5%), three singletons (2%), and four to seven singletons (equal or less than 1%). Except for two samples for which only Illumina sequences were available, this variation had no effect on the phylogenetic placement of the target reads or the delimitation of phylogenetically defined lineages, supporting our earlier findings based on 454 pyrosequencing data that intragenomic ITS variation in *Cora* is low and does not lead to artifactual lineages [41]. The observed exceptions relate to two issues: either the target sequence was too short to cover lineage-diagnostic variation, then typically clustering at the base of the target clade or nearby; or the variation could be interpreted as potential hybridization and introgression, given that the ASVs detected were unique within the run, we are discarding the option of contamination in these specific cases, since multiple ASVs were available matching distinct alleles. Although this needs to be tested with genomic approaches, it would not be expected to lead to artifactual taxa, at least not in terms of species counts, as a hybrid component of the ITS would correspond to another, closely related species. We also considered mixed thalli (i.e., chimeric thalli between closely related species) as a potential source for this pattern, but with the methodological approach used here, this cannot be resolved.

Following earlier work with 454 pyrosequencing [32,41] and the increasing use of ITS1 for the Earth Microbiome Project and in other lichen studies [98], we adopted the ITS1 region as the default portion of the ITS barcoding marker for the Illumina sequencing employed here. Nonetheless, our analysis of full-length sequences indicates that at least in some clades, ITS2 showed better resolution for accurately detecting species using ITSbased BLAST identifications. This may be due to the more variable subterminal portion of the ITS2, which makes reliable alignments more challenging, but it appears to be highly discriminant, even between closely related species. Therefore, future metabarcoding using short reads should also attempt to sequence the ITS2 region [99–101], or focus on longer amplicon sequencing (PacBio, MINon, etc.) or shotgun sequencing, which might provide ways of overcoming sequence length limitations, although each of these techniques comes with its own disadvantages. However, most of the times, either ITS1 or ITS2 already provide enough resolution for species boundaries, especially within our integrative framework, making an Illumina an ideal method when assessing hundreds of samples [101].

In fungal barcoding approaches, it has been argued that single marker approaches, such as with ITS, may lead to inaccurate results or even cause taxonomic inflation if the data are not properly analyzed and interpreted [34,102]. In the case of *Cora* and *Corella*, ITS appears to work remarkably well, even in portions of the backbone, as evidenced by the high level of congruence between our single-marker ITS-based phylogeny and six-marker ASTRAL coalescent tree. At the level of terminal clades, a threshold ITSbased identify value of 99.4% appears to reliably discriminate between species, although some variation is observed which may depend on how recently a species-level clade evolved. In a few recently emerging species complexes, no absolute threshold value could be established and also the ITS-based BLAST results were partially diffuse, whereas in other cases, our initially delimited species-level clades may represent more than one lineage. Overall, these effects largely balance each other in terms of species counts, but may lead to inaccuracies in delimiting species in certain clades. Beyond single-marker ITS approaches, three other strategies could be used to test species delimitation in these cases: (1) multimarker coalescent approaches [103]; (2) phylogenomics target capture approaches [104–106]; and (3) population genetics using microsatellites or RADSeq [107–109].

With regard to multi-marker approaches, our data show that ITS performed better for delimiting species than the protein-coding markers *RPB2* or *EF3*, but also the classical markers nuLSU, mtSSU, and mtLSU; in addition, the ITS marker is much easier to generate. Consequently, multi-marker approaches or alternative barcodes do not seem to constitute a promising next step in resolving problematic species complexes or refining the DNA barcoding approach in this group of basidiolichens. Phylogenomic approaches (e.g., target capture), have also shown limitations in resolving recently evolving species [106], and, therefore, we consider the RADseq approach as potentially useful to further assess difficult species complexes in *Dictyonematinae* in addition to our ongoing metagenomic analyses. For a general barcoding approach, however, including metabarcoding with HTS approaches, we recommend the continued use of the ITS marker, due to its high amplification success and the broad molecular framework it provides to establish species hypotheses in this group of lichen-forming Basidiomycota.

### *4.3. Accurate Assessment of Phylogenetic Diversity in Cora and Corella*

The large amount of ITS data now available allowed us to assess phylogenetic diversity in this group of basidiolichens using various quantitative and semi-quantitative approaches. As an initial approach to establish species hypotheses, we used the same ad hoc delimitation employed in our previous studies [32,33], namely a combination of visual inspection of stem branch lengths and support versus within-clade branch length variation versus geographic origin of the samples. In the present case, this led to the distinction of 265 ad hoc species hypotheses for the entire dataset (including all ASVs) and 175 for the subset of near-complete ITS sequences. Distance-based quantitative approaches (DNADIST-based analysis, ABGD, ASAP) all resulted in numbers within the range of 128–231 for the subset tested and 194–350 for the entire dataset (extrapolated). In contrast, the tree-based method bPTP yielded much higher estimates (709–889 species for the entire dataset). GMYC inferred values more similar to those of distance-based methods, with 189 estimated species in an interval of 145–237. To what extent these estimates might be real remains unclear. If the example of *Usnea antarctica* versus *U. aurantiacoatra* is taken as reference, near-identical ITS patterns may indeed hide more than one species [107,108], and lack of ITS-based resolution is also known from other fungi [35–37]. It is, therefore, possible that clades currently delimited as a single species with our ad hoc approach or using distance-based methods represent more than one species, although a three-fold increase seems unlikely based on our current knowledge. Consequently, we consider our ad hoc approach reliable at this point, as it is closer to the middle range of distance-based estimates and far below the bPTP delimitation approach, which, in turn, showed highly contrasting results to all other methods. An integrative approach was also the solution Boluda and colleagues [110] proposed to disentangle the incongruencies of the use of chemistry, morphology, molecular

data (including multiple species delimitation methods) or phylogeny alone, for species boundaries in the *Bryoria* sect. *Implexae* complex.

The inclusion of a large number of historical collections extended the geographic range of sequenced samples, but still left many areas with potential occurrence of *Cora* (and *Corella*) unsampled. Thus, compared to the present number of 265 species, our original prediction of 450 species [32] still provides a valid framework and it seems likely that this number will eventually be reached. Undersampled regions notably include the central and southern portion of the Andes (Peru, Bolivia, Chile, Argentina), but also large parts of Central America and western Mexico, as well as the Guyana Highlands.

Overall, the present number of formally described (102), phylogenetically distinguished (by our ad hoc approach, 265), and predicted (>450 [32]) that the species in *Cora* does not differ from the range of accepted species in the 25 largest ascomycete lichen genera, which lies between 170 and 820 [111]. As such, the diversity now recognized in *Cora* aligns well with other megadiverse lichenized genera, showing that certain basidiolichen groups may harbor a species diversity similar to the most speciose ascolichen groups, an idea that would have been dismissed by most lichenologists even just a decade ago. Indeed, the observed diversity in these basidiolichens is striking not because there are so many species but because it has not been recognized before, much less at this magnitude. Prior to Parmasto's monograph [30], six species had been formally described in this group (currently named *Cora bovei*, *C. ciferrii*, *C. glabrata*, *C. gyrolophia*, *C. pavonia*, and *C. reticulifera*) and three more in the genus *Corella* (*C. brasiliensis*, *C. tomentosa*, and *C. zahlbruckneri*); all nine had been synonymized by Parmasto under one taxon (*Dictyonema pavonium*). This historical number is remarkably low compared to other genera of similar size (e.g., *Sticta*), with hundreds of names established in the early literature. The main reason for the comparatively low number of historical epithets in *Cora* is that important field characters, such as color and substrate, are lost in herbarium specimens if not recorded at the time of collection, which were the primary source of access for researchers in the 19th century but also for modern monographers. This led to the lack of perception of size as an important character, as smaller herbarium specimens were sometimes interpreted as immature. Even field experience did not reveal the true nature of this group of basidiolichens, as the differences in ecology and morphology between specimens were interpreted as environmentally induced variation [112], a concept popular in the second half of the 20th century [113].

### *4.4. Level of Cryptic Speciation and Potential Taxonomic Inflation*

The existence of hidden or unrecognized species within presumably well-known taxa is not an isolated phenomenon in fungi. In many presumably well-known taxa, such as the fly agaric (*Amanita muscaria* s.lat. [114]), the chanterelle (*Cantharellus cibarius* s.lat. [115–117]), the Lingzhi mushroom (*Ganoderma lucidum* s.lat. [118]), the true morels (*Morchella esculenta* s.lat. [119]), and the yellow speckleberry lichen (*Pseudocyphellaria crocata* s.lat. [120]), species delimitation studies using ITS and other markers have revealed a large number of previously unrecognized lineages.

Although some of these pose difficulties delimiting species phenotypically, other cases, such as *Pseudocyphellaria crocata* s.lat., often reveal taxonomically useful characters that had not been considered to be diagnostic before. In the genus *Cora*, given previous failures to properly recognize species diversity and the low number of characters useful for taxonomy, one would expect a number of over 250 species hypothesized from molecular data to go along with a high level of evolutionary crypticity, resulting in many species undistinguishable through their phenotype, potentially resulting in taxonomic inflation. Although the number of taxonomically useful characters in *Cora* is indeed limited, lacking for instance the diversity of spore types, vegetative propagules, or secondary compounds found in megadiverse ascolichen genera, the comparatively low number of 11 main phenotypes characters led to no less than 82 distinct phenotypes among the 87 analyzed species, rejecting the notion of largely cryptic speciation or taxonomic inflation in this genus. Instead, even with a low amount of perceived options to reliably distinguish species, we demonstrate that the

combination of these characters yields sufficient information allowing to differentiate most species detected by molecular methods. Cases of identical phenotypes were in part found in distantly related lineages only (i.e., pseudophylocryptic), thus representing homoplasies rather than genuine cryptic speciation, whereas closely related species were mostly phenotypically distinct. Indeed, among 87 lineages, we identified only one case where two closely related species could not be distinguished by phenotype or chorology (euphylocryptic). Other cases differed either in one character state (kapophylocryptic) or in distribution (allophylocryptic). This supports the phenotype as useful for species-level taxonomy but renders phenotypic characters as of limited value when inferring phylogenetic relationships within this genus, with the exception of a few characters that correlate with larger clades.

Our results thus sugges<sup>t</sup> that phenotype variation, species delimitation and the level of homoplasy in the basidiolichen genus *Cora* are comparable to large genera of lichenized Ascomycota, in which a limited set of phenotype characters leads to free or partially constrained combinations of character states in individual species. For example, in the crustose genus *Lecanora*, with 550 species [111], species are usually recognized by a combination of thallus morphology, apothecial disc color, epihymenial and excipular crystals, and chemistry [121–130], whereas in *Usnea*, a combination of growth form, branching pattern, thallus sectional structure, branch outgrowths and appendices, and secondary chemistry and pigments define species [131–136]. Other examples can be found in foliose Parmeliaceae review in [137], such as *Bulbothrix* [138], or the crustose genera *Caloplaca* [139,140], *Graphis* and *Allographa* [141,142]. Thus, in both Asco- or Basidiomycota, phenotypical characters may not correspond to molecular phylogenies at all clade levels [31], but they are useful in diagnosing closely related species within clades.

If the remarkable species diversity in *Cora* is largely not cryptic, the question must again be raised: why has it not been recognized before? As mentioned above, reasons can be looked for in the loss of important features in herbarium collections, similar to mushroom taxonomy, but also in the overinterpretation of variation as ecologically induced and not taxonomic. *Cora* is, therefore, not really a case of "hidden" diversity, but one of previously unrecognized or "overlooked" diversity.

The notion that phenotypically similar species of *Cora* are generally only distantly related could be explained by similar selective pressures in ecologically equivalent habitats, but in part also by free variation of a limited set of characters that may not represent functional traits. Once the phylogenetic diversity of *Cora* has been fully assessed phenotypically, this will be an exciting avenue for future studies. Fortunately, given the techniques to assess phenotype characters in herbarium collections [33], archival specimens for which sequence data are now available can be incorporated in such studies, providing a much broader geographical and ecological framework.
