*Article* **In Silico Assessment of Probe-Capturing Strategies and Effectiveness in the Spider Sub-Lineage Araneoidea (Order: Araneae)**

**Yi-Yen Li 1,2, Jer-Min Tsai 3, Cheng-Yu Wu 1, Yi-Fan Chiu 1, Han-Yun Li 1, Natapot Warrit 4, Yu-Cen Wan 1, Yen-Po Lin 2,5, Ren-Chung Cheng 2,\* and Yong-Chao Su 1,\***


Tel.: +886-4-228-40416 (ext. 707) (R.-C.C.); +886-7-312-1101 (ext. 6983) (Y.-C.S.)

**Abstract:** Reduced-representation sequencing (RRS) has made it possible to identify hundreds to thousands of genetic markers for phylogenomic analysis for the testing of phylogenetic hypotheses in non-model taxa. The use of customized probes to capture genetic markers (i.e., ultraconserved element (UCE) approach) has further boosted the efficiency of collecting genetic markers. Three UCE probe sets pertaining to spiders (Araneae) have been published, including one for the suborder Mesothelae (an early diverged spider group), one for Araneae, and one for Arachnida. In the current study, we developed a probe set specifically for the superfamily Araneoidea in spiders. We then combined the three probe sets for Araneoidea, Araneae, and Arachnid into a fourth probe set. In testing the effectiveness of the 4 probe sets, we used the captured loci of the 15 spider genomes in silico (6 from Araneoidea). The combined probe set outperformed all other probe sets in terms of the number of captured loci. The Araneoidea probe set outperformed the Araneae and Arachnid probe sets in most of the included Araneoidea species. The reconstruction of phylogenomic trees using the loci captured from the four probe sets and the data matrices generated from 50% and 75% occupancies indicated that the node linked to the *Stegodyphus* + RTA (retrolateral tibial apophysis) clade has unstable nodal supports in the bootstrap values, gCFs, and sCFs. Our results strongly indicate that developing ad hoc probe sets for sub-lineages is important in the cases where the origins of a lineage are ancient (e.g., spiders ~380 MYA).

**Keywords:** target sequencing; reduced representation sequencing (RRS); spider phylogenomics; deep phylogeny

#### **1. Introduction**

High-throughput sequencing is widely used for the generation of genomic data in phylogenomic research [1–4]. Reduced-representation sequencing (RRS) methods [5] have made it possible to collect hundreds to thousands of genetic markers at a fraction of the cost of whole-genome sequencing [6]. The ultraconserved elements approach (UCE approach), a form of target DNA sequencing, is becoming particularly prevalent [7–9]. The UCE approach using customized probes makes it possible for researchers to capture thousands of genetic markers from non-model taxa, thereby making it possible to test hypotheses about phylogeny from shallow (e.g., <5 MYA) to deep (e.g., >200 MYA) divergence times [10]. Despite the importance of the UCE approach in phylogenomics, the design of ad hoc probe

**Citation:** Li, Y.-Y.; Tsai, J.-M.; Wu, C.-Y.; Chiu, Y.-F.; Li, H.-Y.; Warrit, N.; Wan, Y.-C.; Lin, Y.-P.; Cheng, R.-C.; Su, Y.-C. In Silico Assessment of Probe-Capturing Strategies and Effectiveness in the Spider Sub-Lineage Araneoidea (Order: Araneae). *Diversity* **2022**, *14*, 184. https://doi.org/10.3390/d14030184

Academic Editor: Luc Legal

Received: 18 December 2021 Accepted: 28 February 2022 Published: 3 March 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

sets remains a technical gap such that many researchers are forced to use probe sets designed for similar taxa or for different taxonomic levels. In the current study, we compared the effectiveness of an ad hoc probe set for spiders in the superfamily Araneoidea to the existing probe sets that are known to be applicable to higher taxonomic levels in arachnids [9,11].

UCEs are non-variable genomic fragments that occur across species in a given taxonomic group [12]. These genomic fragments, which are often in >200 bps conserved regions [13], have been detected in a variety of taxa [14]. The functions of these UCEs are unknown [15], and the types of UCEs vary among taxonomic groups [16]. Hedin et al. [17] showed that the spider UCEs mostly correspond to exons. In a pioneering work, Faircloth et al. [18] captured 854 UCE loci to reconstruct the phylogenomic tree of birds. Subsequent research assessed the utility of the UCE approach in applying phylogenomic hypotheses to taxa dated from 5 MYA to 200 MYA [7,19,20]. The UCE approach has also been extended to the reconstruction of species trees and coalescent methods [21,22]. Recent advances in the UCE approach have strengthened phylogenetic hypothesis testing and phylogenetic tree reconstruction, including for arthropods [9,10,23].

The UCE approach was first applied to arachnids by Starrett et al. [23], and to Araneae by Kulkarni et al. [11]. Note that the order Araneae includes 49,877 species [24] in three subclades, suborder Mesothelae, infraorder Mygalomorphae, and infraorder Araneomorphae, with evolutionary time extending back to more than 300 MYA [25,26]. The application of the probe sets designed for the higher levels (for order and class) could be problematic because the probes may not be fully targeted, thus reducing the number of captured loci when testing the hypotheses of the phylogeny within suborders and lower taxonomic levels. Xu et al. [27] tested a customized probe set in the suborder Mesothelae. Hedin et al. [23] reconstructed the UCE phylogenomic tree in the Mygalomorphae species. In the current study, we developed an ad hoc UCE probe set for the superfamily Araneoidea, which contains 17 families, 25% spider diversity, and a variety of web architectures [28].

We developed the Araneoidea probe set in accordance with the pipeline outlined by Faircloth [9]. We then compared the effectiveness of our Araneoidea probe set with those for arachnid and Araneae. Finally, we combined these three probe sets as the fourth probe set. Note that we did not assess the Mesothelae probe set because it is clearly applicable at that suborder level [27]. We evaluated the effectiveness of the four probe sets in two schemes. (1) We performed in silico testing on the number of captured UCE loci in 15 genomes of Araneoidea and other spider species. (2) We compared the phylogenomic trees reconstructed using the concatenation and gene–tree–species–tree approaches with various data matrices to compare the tree topologies and node supports.

#### **2. Materials and Methods**

#### *2.1. Data Sources of UCE Loci*

As data sources for our in silico testing, we employed two published probe sets for ultraconserved elements [9,11], including 14 published genomes (Table 1) and 1 de novo assembled genome (*Argyrodes miniaceus*).

#### *2.2. Genome Assembly*

We assembled the genomes de novo using the procedure below. We used TRIMMO-MATIC [29] for raw read trimming and adaptor removal. KMERGENIE [30] was then used to estimate the optimal k-mer length for genomic assembly. Finally, ABYSS 2.0 [31] was used to assemble the genome for *Argyrodes miniaceus* using the following settings: k = 55, B = 30 G. ABYSS–FAC was used to evaluate the quality of the genome assembly.

**Table 1.** Genomes fetched from GenBank. Information for all 14 genomes used in this research, which were fetched from GenBank. This table displays species name, assembly accession, assembly level, assembly submission date, N50 of contigs, coverage rate, and references for each genome. Adding *Argyrodes miniaceus*, 15 genomes are included in this study.


*2.3. Design of UCE Probe Set for Araneidae*

The design of UCE probes was based on the PHYLUCE pipeline [9,44]. The genomes of *Argyrodes miniaceus*, *Latrodectus hesperus*, *Loxosceles reclusa*, *Trichonephila clavipes*, *Parasteatoda tepidariorum*, and *Stegodyphus mimosarum* were used in UCE probe design as follows: (1) ART v2016.06.05 [45] was used to simulate genomic fragments of *Argyrodes miniaceus*, *Latrodectus hesperus*, *Loxosceles reclusa*, *Trichonephila clavipes*, and *Parasteatoda tepidariorum* into 100-bps reads. (2) Simulated short reads were aligned to *Stegodyphus mimosarum* (i.e., the base genome [32]) using STAMPY (substitution rate = 0.05 and insert size = 400) [46]. Misaligned fragments were removed using SAMTOOLS [47], and the aligned fragments were combined in the browser-extensible data (BED) format using BEDTOOLS [48]. (3) Duplicated genomic fragments were removed using PHYLUCE script (phyluce\_probe\_strip\_masked\_loci\_from\_set) to detect and remove fragments that were mapped but designated too short (<80 bps) or within masked regions of the *Stegodyphus mimosarum* genome (more than 25%). (4) SQLITE v 3.34.0 [49] was used to construct a database of candidate UCE sites for *Argyrodes miniaceus*, *Latrodectus hesperus*, *Loxosceles reclusa*, *Trichonephila clavipes*, and *Parasteatoda tepidariorum* to determine the shared conserved regions. PHYLUCE was then used to remove duplicated candidate probes, and LASTZ [50] was used to align the candidate probes with a given genome to enable the extraction of UCE sites for a given species. (5) Finally, we relaxed the similarity to 50% and reconstructed the database in SQLITE to create the final UCE probes, whereupon we repeated the duplicate-probe removal process in PHYLUCE. The final probe length was 120 bps with tiling, with 60 bps overlapping (thus covering 180 bps) per target locus.

#### *2.4. In Silico Simulation of Probe Sets Aimed at Capturing Affinity*

Simulations were conducted using four probe sets for Arachnids (Arachnid probe set [9]), Araneae (Araneae probe set [11]), Araneoidea (Araneoidea probe set), and a combination of these three (combined probe set). In generating the combined probe set, we compiled the probe sets for Arachnid, Araneae, and Araneoidea and removed potential duplicated probes using LASTZ Python script (phyluce\_probe\_remove\_duplicate\_hits\_from\_ probes\_using\_lastz [44]). Each probe set was tested on 15 genomes using standardized testing procedures. (1) The probes were aligned with the targeted genome using LASTZ [50]. (2) The probes were then aligned and mapped to the targeted genome using PHYLUCE [44] to extract the 500-bps regions on both sides of the UCE probe sites. Note that our objective was to simulate the fragment length when conducting in-solution capture. (3) The probes were aligned with each extracted sequence using LASTZ by running phyluce\_assembly\_match\_contigs\_to\_probes to identify which loci sequences belonged. The duplicates were again sorted out. Following the capture and filtering of fragments from each probe set and each targeted genome, the number of captured loci and proportion of identified loci (defined as the capture rate) in all simulated contigs were calculated per genome per probe set. The captured loci per probe set were then used to reconstruct the phylogenomic tree per data matrix from each probe set.

#### *2.5. Reconstruction of Phylogenomic Tree Using the Captured Data Matrix for Each Probe Set*

We generated the data matrices for different occupancy (the smallest percentage of data per locus in a matrix). We counted the number of in silico captured loci in each genome under occupancies from 10% to 100%, with 10% as the increment. We then decided the occupancies to use in the final tree-reconstruction analyses. The script phyluce\_align\_get\_only\_loci\_with\_min\_taxa was used to output the locus matrices of the probe set using occupancies 50% and 75%, respectively. This allowed the omission of up to 50% and up to 25% missing taxa per locus, which resulted in two matrices per probe set. In total, we used eight locus matrices in reconstructing the phylogenomic trees of available spider species.

The MAFFT [51] script, phyluce\_align\_seqcap\_align, was used to align each UCE locus, whereupon the ends of the aligned fragments were trimmed using GBLOCKS (default arguments of PHYLUCE: −b1 = 0.5, −b2 = 0.85, −b3 = 8, and −b4 = 10) [52] via the script phyluce\_align\_get\_gblocks\_trimmed\_alignments\_from\_untrimmed. We then concatenated the aligned loci using phyluce\_align\_concatenate\_alignments to produce a matrix for each probe-set per occupancy and then output the matrices in PHYLIP format (partition scheme) and in NEXUS format. After detecting the models with the best fit in each locus using MODELFINDER [53], IQTREE-2.0.3 [54] was used to reconstruct the phylogenomic trees via concatenation involving 1000 bootstrap operations. We also used the gene–tree–species–tree approach in IQTREE-2.0.3 to infer the gene trees and calculate the gene concordance factors (gCFs) and site concordance factors (sCFs) for the nodes associated with species tree [55]. In accordance with the methods outlined by Wheeler et al. [56], *Acanthoscurria geniculata* (Theraphosidae) was used as an outgroup. Finally, FIGTREE [57] was used to visualize phylogenomic trees.

#### **3. Results**

#### *3.1. De Novo Genome Assembly*

Using Illumina Hi-seq short-read sequences, we assembled a genome, *Argyrodes miniaceus*. From 7,051,281 contigs in *Argyrodes miniaceus*, we obtained a total assembled length of 35.51 × 106 bps with N50 = 618 bps and a maximum assembled contig of 5834 bps (for genome assembly statistics, see Table S1). This de novo assembled draft genome was then intended to be used to detect UCE probes.

#### *3.2. Probe Detection*

Using *Stegodyphus mimosarum* as the base genome in accordance with the methods outlined by Faircloth et al. [9], we detected 12,679 probes related to 1374 UCE loci using *Argyrodes miniaceus*, *Latrodectus hesperus*, *Loxosceles reclusa*, *Trichonephila clavipes*, and *Parasteatoda tepidariorum*.

#### *3.3. In Silico Testing of Capture Efficiency*

We used four probe sets for the in silico capture of targeted loci from 15 genomes. We detected a total of 7357 loci using the newly designed Araneoidea probe set (Figure 1 and Tables S2–S5). From the Arachnid probe set, we detected a total of 4579 loci. From the Araneae probe set, we detected a total of 9103 loci. Accordingly, even though we mostly used the genomes in Araneoidea in this study, we collected fewer loci compared with

the Araneae probe set, which mostly included the well-assembled genomes fetched from GenBank (Table 2). From the combined probe set, we detected 14,271 loci.

**Figure 1.** Results of in silico tests. Capture rate of loci (**A**) and captured loci number (**B**) in each probe set. The Araneoidea species are shaded in orange.

The performance of each probe set in capturing the loci in each genome varied as a function of the taxonomic group. In eight of the genomes, the Araneae probe set outperformed the Araneoidea probe set in the capture of loci (Figure 1). The Arachnid probe set outperformed the Araneoidea probe set in only one species, *Acanthoscurria geniculata* (Mygalomorphae). Nonetheless, the Araneoidea probe set outperformed two published probe sets, mostly in the Araneoidea species (e.g., *Trichonephila clavipes*, *Latrodectus hesperus*, *Argiope bruennichi*, *Anelosimus studiosus*, *Oedothorax gibbosus*, and *Araneus ventricosus*). The combined probe set outperformed all probe sets in most species except *Araneus ventricosus*, *Argiope bruennichi*, and *Dysdera silvatica*, which presented very few loci (3–30 loci) in each probe set (Figure 1).

The recovery rates of the probe sets were assessed using the combined probe set as targeted contigs to determine the number of captures and capture rates. The re-capture number and re-capture rates were lowest in the Arachnid probe set, followed by the Araneoidea probe set and the Araneae probe set (Figure 2).

**Table 2.** List of the genomes for probe-set design and the numbers of captured loci. All probe sets used in this research are listed below, ordered by published year. This table displays the targeted taxa of probe-set design, species names of the genomes used to identify UCE loci, species names of the genomes used to design probes, number of UCE loci, number of probes, and the published year of the probe set.


*3.4. Capture Rates and Number of Loci in Various Occupancies*

We present the number of captured loci in each genome under occupancies from 10% to 100%, in increments of 10%. The numbers of captured loci were higher in the combined probe set and Araneae probe set under occupancies of 10% to 30%. The numbers of captured loci did not vary considerably under occupancies of >50%. We observed similar trends in the retention ratio, with the highest retention in the Araneoidea probe set, and a merging of results at occupancies of >50% (Figure 3). Thus, in accordance with the UCE-phylogenomic results published earlier, we used occupancies of 50% and 75% in reconstructing the phylogenomic trees [11,25]. Note, however, that this strategy reduced the in silico capture number to less than 350 loci in each genome (see Figure 4; for other trees, see Figures S1–S7).

**Figure 2.** Comparison of all three probe sets with the combined probe set. Capture rate of loci (**A**) and captured loci number (**B**) when each probe set aligned with simulated contigs using the sequences of the combined probe set.

#### *3.5. Tree Reconstruction Using Simulated Captured Loci*

We reconstructed the phylogenomic trees using two occupancies (50% and 75%) for the loci captured from the four probe sets, thereby resulting in eight data matrices. The resulting topologies were similar to previous findings (e.g., Kulkarni et al. [58]), and the supports (i.e., bootstrap, gCF, and sCF) of each node were similar between the results of these two datasets (shown in Figure 4B,C).

**Figure 3.** Loci numbers of data sets. (**A**,**B**) Number of loci changed by filtered increment of 10% (from 10% to 100%) occupancies (for data, see Table S6). (**C**–**F**) Number of loci used in tree reconstruction of each data set (Table S7).

**Figure 4.** Phylogenetic tree of the data set that had the most captured loci (combined probe set filtered with 50% occupancy). (**A**) Tree topology: color of triangles represents three types of support values bootstrap, gCF, and sCF—and dashed lines indicate the loci numbers used in tree reconstruction. (**B**,**C**) Support values for different nodes on reconstructed trees using combined probe set of 50% (**B**) and 75% (**C**) occupancies. Other trees, see Figures S1–S7.

#### **4. Discussion**

This study suggests that even when dealing with a monophyletic group (e.g., Araneae), an ancient evolutionary origin (e.g., ~380 MYA), the use of a specific probe set to test phylogenetic hypotheses within a sub-lineage could benefit via more lineage-specific loci, and potentially, more captured loci. A specific probe set is meant to enable the capture of a larger number of specific loci to facilitate phylogenomic analysis when combined with probes designed for higher taxonomic levels. The number of loci revealed by the Araneoidea probe set (7357 loci) was lower than that of the Araneae probe set (9103 loci). However, the loci captured in the Araneoidea species outperformed the probe sets designed for higher taxonomic levels (the Arachnid and Araneae probe sets, Figures 1 and 2). Incremental testing of occupancy from 10% to 100% revealed that the probe set designed specifically for Araneoidea presented a more gradual loss of retention than the other probe sets, including the combined probe set (Figure 3A,B). The higher retention rate made possible by the specific probe sets produced a larger number of orthologous loci that only occurred in the targeted clade. In tree reconstruction, the tree topologies were consistent across the eight data matrices in the basal nodes and the node related to Araneoidea (Figure 4). Note, however, that the nodal supports (node 10, Figure 4) in the *Stegodyphus* + RTA (retrolateral tibial apophysis) clade were unstable, thereby supporting our claim that a specially designed probe set is necessary for a sub-lineage (e.g., the RTA clade).

Our in silico results showed that the numbers of captured loci using the combined probe set generally outperformed other probe sets. Among the specifically designed taxon probe sets, the Araneoidea probe set captured a larger number of loci in five of the six genomes used to develop the probe set. However, both probe sets performed poorly in the RTA clade (Figures 1 and 2). We detected 490.4 ± 464.7 loci (range = 28 to 1083) in the Araneoidea probe set, and 951.4 ± 974.4 loci (range = 28 to 2493) in the combined probe set. Note that the newly assembled draft genome for *Argyrodes miniaceus* returned a relatively low number of loci (182 and 142 in the combined and Araneoideae probe sets, respectively). The other Araneoidea genomes included in this study, which assembled in lower qualities (Table 1), generated <270 loci, thereby demonstrating that the number of captured loci was biased toward the well-assembled genomes used in the design of the probes. The Araneae and Arachnid probe sets generated low numbers of loci in Araneoidea genomes (lower than the Araneoidea probe set, except *Parasteatoda tepidariorum* and *Argyrodes miniaceus*). These showed a taxon-specific trend that the probe sets designed for higher taxonomic levels tended to capture fewer, and nearly insufficient, loci for phylogenomic analyses. Together with the results obtained using the four probe sets, we found that the quality and completeness of the genomes could have a deterministic effect on the number of captured loci. Moreover, the taxonomic group played a role in the number of captured loci, i.e., if there were no representative genomes in a clade, a low number of captured loci would be observed (see the RTA clade in Figure 1).

We did not observe large variations in the tree topologies reconstructed using different data matrices (i.e., with loci captured from different probe sets). However, the nodal supports dropped in both traditional bootstrap statistics and in the concordance factors (gCF and sCF) when there were no representative genomes used for probe design (i.e., *Stegodyphus* + RTA clade, in our case) (Figure 4). Within Araneoidea, nodes 3 and 5 did not perform well in gCF and sCF; however, the results still met an acceptable level of >33, thereby indicating a possible downside of using existing probe sets to resolve these nodes. Bootstrap values tended to generate optimistically high support, as observed in other phylogenomic studies [55]. In the current study, we used 50% and 75% occupancies to generate data matrices for phylogenomic analysis, with the mean number of loci varying from 35.5 to 203.6 per genome in 50% occupancy matrices, and a mean of 18.4 to 80.1 per genome in 75% occupancy matrices. The number of captured loci in silico was significantly lower than would be expected in real-world, in-solution captured data. The mean number of captured loci was 589.3 in the Arachnid probe set [23] and 553.71 in the Araneae probe set [11]. Our in silico results, in rough estimation, only captured up to 1/3 of the loci compared with

the in-solution captured results. We inferred that for in silico testing, we constrained the sequence identity to 80% for the capture of loci. From a practical perspective, in-solution capture could likely have required a lower degree of similarity to capture DNA fragments. As we aimed to relatively compare the numbers of the captured loci from different probe sets under the same in silico condition, we therefore expected to capture a larger number of loci when using our probe set for in-solution capture under laboratory conditions in the Araneoidea species.

#### **5. Conclusions**

This study designed specific probe sets using six genomes to facilitate the testing of phylogenetic hypotheses pertaining to Araneoidea. When using in silico capture, the data matrices generated using the combined and Araneoidea probes resolved most of the nodes in the sub-clades in Araneoidea, resulting in several hundred loci (relatively more loci compared with other non-targeted taxa). We expected that when conducting insolution capture in a wet lab, it should be possible, using the estimated 1/3 in silico/insolution ratio, to capture more than one thousand loci per genome. In our preliminary test using Argyrodinae as a targeted taxon, we captured 897.5 ± 62.9 loci per genome, which is about 4× the number captured in our in silico *Argyrodes miniaceus* results (182 or 142 loci). However, there are disadvantages to using this newly designed probe set, e.g., (1) fewer applicability to other taxa such as the RTA clade, and (2) potentially higher costs when synthesizing this customized probe and the combined probe sets. Moreover, our combined approach showed that it broadens the application of the probe sets given there are representative genomes collected from a sub-lineage. However, these approaches should be tested in a wet lab to validate the applicability of the Araneoidea and combined probe sets.

**Supplementary Materials:** The following supporting information can be downloaded at https://www. mdpi.com/article/10.3390/d14030184/s1: all generated phylogenetic trees in this article, assembled genome statistics, and the raw data in Figures 1–4; Figure S1: Phylogenetic tree of the data set of the combined probe set filtered by 75% occupancy; Figure S2: Phylogenetic tree of the data set of the Arachnid probe set filtered by 50% occupancy; Figure S3: Phylogenetic tree of the data set of the Arachnid probe set filtered by 75% occupancy; Figure S4: Phylogenetic tree of the data set of the Araneae probe set filtered by 50% occupancy; Figure S5: Phylogenetic tree of the data set of the Araneae probe set filtered by 75% occupancy; Figure S6: Phylogenetic tree of the data set of the Araneoidea probe set filtered by 50% occupancy; Figure S7: Phylogenetic tree of the data set of the Araneoidea probe set filtered by 75% occupancy; Table S1: Statistics of genome and raw reads of Argyrodes miniaceus; Table S2: Results of probe set designed for Arachnid probe set in silico test on each genome; Table S3: Results of probe set designed for Araneae probe set in silico test on each genome; Table S4: Results of probe set designed for Araneoidea probe set in silico test on each genome; Table S5: Results of probe set designed for combined probe set in silico test on each genome; Table S6: Retention number of loci in data sets after filtering by different standards; Table S7: Loci number of each genome after filtering by occupancy.

**Author Contributions:** Conceptualization, Y.-C.S. and R.-C.C.; methodology, Y.-Y.L. and Y.-C.S.; software and hardware maintenance, J.-M.T. and Y.-C.W.; validation, C.-Y.W., Y.-F.C. and H.-Y.L.; formal analysis and data curation, Y.-Y.L., Y.-C.S. and C.-Y.W.; writing—original draft preparation, Y.-Y.L., Y.-F.C. and H.-Y.L.; writing—review and editing, Y.-C.S., R.-C.C. and N.W.; visualization, C.-Y.W. and Y.-Y.L.; funding acquisition, Y.-P.L.; supervision, project administration, and funding acquisition, Y.-C.S., R.-C.C. and N.W. All authors have read and agreed to the published version of the manuscript.

**Funding:** The funding sources are the Ministry of Science and Technology, Taiwan (110-2621-B-037 -001-MY3 and 107-2621-B-037-001-MY2 to Y.-C.S.; 108-2621-B-005-001 and 109-2621-B-005-003-MY2 to R.-C.C.), and the National Science and Technology Development Agency, Thailand (NSTDA: JRA-CO-2563-11148-TH to N.W.). Part of this research is from Y.-Y.L.'s thesis.

**Institutional Review Board Statement:** Our in silico study does not require Institutional Review Board approval as it does not involve human or vertebrate materials.

**Data Availability Statement:** The de novo assembled genome and the probe sets are available at GitHub: https://github.com/yiyenl/designing-probe-set-for-Araneoidea/blob/main/combine\_ probe.pl. Other visualized results are in the Supplementary Materials.

**Acknowledgments:** We thank all Ecology and Evolutionary Lab members at Kaohsiung Medical University, Taiwan.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


### *Article Solenysa***, a Cretaceous Relict Spider Group in East Asia**

**Jiahui Tian 1,2, Yongjia Zhan 1, Chengmin Shi 3, Hirotsugu Ono <sup>4</sup> and Lihong Tu 1,\***


**Abstract:** A time scale of phylogenetic relationships contributes to a better understanding of the evolutionary history of organisms. Herein, we investigate the temporal divergence pattern that gave rise to the poor species diversity of the spider genus *Solenysa* in contrast with the other six major clades within linyphiids. We reconstructed a dated phylogeny of linyphiids based on multi-locus sequence data. We found that *Solenysa* diverged from other linyphiids early in the Cretaceous (79.29 mya), while its further diversification has been delayed until the middle Oligocene (28.62 mya). Its diversification trend is different from all of the other major lineages of linyphiids but is closely related with the Cenozoic ecosystem transition caused by global climate changes. Our results suggest that *Solenysa* is a Cretaceous relict group, which survived the mass extinction around the K-T boundary. Its low species diversity, extremely asymmetric with its sister group, is largely an evolutionary legacy of such a relict history, a long-time lag in its early evolutionary history that delayed its diversification. The limited distribution of *Solenysa* species might be related to their extreme dependence on highly humid environments.

**Keywords:** molecular phylogeny; divergence time; relict group; Linyphiidae

**1. Introduction**

Molecular data has become an indispensable tool for the reconstruction of phylogenetic relationships among species and provides important insights on the evolutionary histories of many animal groups. It is common in systematics that a high-level molecular phylogeny may significantly conflict with the established taxonomic system based on morphological characters [1–4]. Understanding the evolutionary past that shaped the species diversity of lineages always attracts the interest of biologists [5–8]. Spiders are generalist predators, forming a successful terrestrial animal group, and their high species diversity is distributed unevenly across lineages [9], even extremely asymmetrically between sister groups [4,10]. Several hypotheses have been proposed to interpret the driving forces that promote spider diversification, such as co-diversification with insects [11–13], key innovations in silk structure and web architecture [10], repeated evolution of the respiratory system from book lungs to tracheae [14], and foraging changes from using capturing web to cursorial habits [15]. While these studies usually focus on the driving forces for fast diversifications that lead to a speciose clade, little attention having been exerted to the factors that might result in groups with poor species diversity. Herein, we investigated the evolutionary history of *Solenysa* spiders, one of the seven main clades within linyphiids, with poor species diversity compared to other clades [4].

Great conflicts exist between the molecular phylogeny of linyphiids and the classical taxonomic system. Linyphiidae is an ancient spider group and its earliest fossil record dates back to the early Cretaceous, about 125–135 mya [16]. As with many other spiders, linyphiids have experienced adaptive radiation, accompanying the fast radiation of insects in the Cretaceous. Currently, with more than 4700 recognized species, Linyphiidae represents

**Citation:** Tian, J.; Zhan, Y.; Shi, C.; Ono, H.; Tu, L. *Solenysa*, a Cretaceous Relict Spider Group in East Asia. *Diversity* **2022**, *14*, 120. https:// doi.org/10.3390/d14020120

Academic Editors: Michael Wink and Martín J. Ramírez

Received: 21 November 2021 Accepted: 5 February 2022 Published: 8 February 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

the second largest group of the order Araneae [17]. Generally, linyphiids are conservative in somatic features but have complex genitalia with species-specific characters, which are used as criteria for species recognition. The classical taxonomic system of Linyphiidae consists of seven subfamilies [18]. However, this was not supported by molecular phylogenetic analyses [1,4,7]. Four of them, Linyphiinae, Erigoninae, Micronetinae, and Ipainae, were not monophyletic groups; the representatives of Mynogleninae and Dubiaraneinae fell into Linyphiinae; the Stemonyphantinae taxa were often clustered with pimoids, the sister group of linyphiids [1,4,19–21]. The subfamily Stemonyphantinae was newly revised by adding two ex-pimoid genera and another linyphiid genus in it [8]. However, the seven-clade topology of molecular phylogenies are robustly supported, and all of these seven major clades (clades A–F and S in [4]) are supported by some putative synapomorphic characters.

Spider diversity is distributed unevenly among the seven major clades within linyphiids, especially between sister groups [4]. The relationships among the seven major lineages are puzzling. The cladogenetic events of the seven-clade topology were correlated with successive transformations on the state of the epigynal plate that was defined by the location of the copulatory openings and tracings of epigynal tracts [4]. Generally, a set of epigynal characters forms an epigynal type that means a certain interaction pattern between male and female genitalia during copulation [22]. The series of state transformations of the epigynal plate coupled with those lineage divergence events make the seven-clade topology meaningful [4]. Nevertheless, the driving forces that shape the unbalanced diversification across lineages within linyphiids remain unresolved. Among them, the extremely asymmetric species diversity between *Solenysa* (15 spp.) and its sister group (2885 spp., [18]) provides us a model system to study the evolution of such an asymmetry.

Clade S in [4] is composed of a single genus, *Solenysa*, and is a unique lineage in linyphiids. All *Solenysa* species display a distinctive somatic appearance and special genital morphology. These make them easily distinguished from all other linyphiids [23,24]. However, the placement of *Solenysa* within linyphiids has long been controversial. Saaristo [25] placed *Solenysa* together with some micronetine genera into a new subfamily, Ipainae, largely based on the females having a movable epigynum. However, this treatment was disproved by the phylogenetic analyses either based on molecular data or morphological data [1,4,24]; the molecular phylogeny supported a sister relationship between *Solenysa* and clade B, which is a hodgepodge composed of all erigonines, some micronetines, and linyphiines. Although such a sister-relationship was supported by some putative synapomorphies, in comparison to its speciose and widespread sister-clade, the *Solenysa* clade appeared unusual in term of species diversity and limited distributions and is generally only known from type localities and small adjacent areas (Figure 1; [23,24,26–32]). The underlying evolutionary process that gave rise to such a biased diversity pattern between these sister clades remains to be explored.

**Figure 1.** Distribution of *Solenysa* species. Circles in color represent species of different clades in Figure 2, green, clade L; blue, clade W; purple, clade J, red, clade P. Dash line circles indicate locations of some supposed Pleistocene ice age refugia in East Asia.

**Figure 2.** ML tree of linyphiid phylogeny with proportional branch lengths. Branches in color show the seven major clades within Linyphiidae. Taxa names in color indicate their current subfamilial placements. Thickened blue branches in clade A indicate taxa having a movable epigynum. Thickened green branches in clade B indicate taxa having a desmitracheate system. Thickened branches in clade S indicate all *Solenysa* taxa having both movable epigynum and an intermediate type tracheate system. Bars on branches indicate corresponding node supports of the main clades: the anterior show the maximum likelihood bootstrap (BS) and the posterior Bayesian posterior probability (PP), respectively. Bars in black indicate BS > 80%; PP > 0.95; bars in grey BS < 80%, PP < 0.95. Trees with all the node supports are included as Supplementary Material.

Studies on the defining features (synapomorphies) of a speciose group, especially in comparison to its sister group, may help us in search of potential drivers that promote fast diversification [9]. According to Wang et al. [4], the tracheal system in clade B repeatedly evolved from the haplotracheate to desmitracheate, in which the median pair tracheae extensively branched and extended into the prosoma [33]. Especially the distal erigonines clade, all species have a desmitracheate system forming the largest group in clade B [34], contributing more than half of the species diversity of Linyphiidae. While the tracheal system in *Solenysa* remains as an intermediate type, with the median pair unbranched in the opisthosoma but extended into the prosoma where they branch and extend into the legs (Tu, personal observation; [24]). Anterior extending tracheae would provide oxygen directly to the brain and legs [35,36], and extensively branched tracheae may help in reducing water loss [9,37]. These imply that the selective advantages of the desmitracheate system might trigger the fast diversification in clade B, while the diversification in the *Solenysa* clade remained slower, which resulted in the biased species diversity between them. Nevertheless, whether there are any other reasons is unclear.

In the present study, we aimed to explore the reason that caused the biased diversification between *Solenysa* and its sister group. Our results from phylogenetic analyses, molecular dating, and lineage-wise diversification tracing through time suggested that *Solenysa* represents a Cretaceous relict spider group in East Asia, and historical climate changes have played a pivotal role in shaping its evolutionary history.

#### **2. Materials and Methods**

#### *2.1. Taxon Sampling*

In the present study, we collected and sequenced four additional *Solenysa* species, *S. lanyuensis*, *S. yangmingshana*, *S. macrodonta*, and *S. ogatai* from their type localities on Taiwan Island and Japanese islands. The relevant sequence data for nine *Solenysa* species, *S. longqiensis*, *S. retractilis*, *S. tianmushana*, *S. wulingensis*, *S. protrudens*, *S. mellotteei*, *S. trunciformis*, *S. reflexilis*, and *S. partibilis*, have been generated in our earlier study [4]. Another two known species, *S. spiralis* from Sichuan, China, and *S. geumoensis* from the Korean Peninsula, were not sampled due to a lack of fresh materials for sequencing. Thus, 13 of the 15 known *Solenysa* species (86.7%) were included in the present study. To explore the phylogenetic position of *Solenysa* species within Linyphiidae, other linyphiid taxa of Wang et al. [4], except for the four unstable long-branch taxa and those repeated taxa, were compiled in our dataset. Pimoidae is the sister group of Linyphiidae and often used as outgroups for rooting. Given the recent revision of Pimoidae having the formerly pimoid genera *Putaoa* and *Weintrauboa* transferred to Linyphiidae, and redelimited Pimoidae as including only *Pimoa* and *Nanoa* [8], we also added representatives of *Pimoa* and *Putaoa* into the dataset. The final data set consisted of 127 taxa, including 13 *Solenysa* taxa, 113 other linyphiids, and one pimoid.

Collected information on the four newly sequenced *Solenysa* specimens is listed in Supporting Material Table S1. Specimens used for the molecular study were fixed in 95% ethanol and kept at −20 ◦C before DNA extraction. All newly collected specimens are deposited at the College of Life Sciences, Capital Normal University (CNU).

#### *2.2. Laboratory Protocols for Molecular Data*

Five loci used in [4], including two mitochondrial genes, cytochrome c oxidase subunit I (COI), and 16S rRNA (16S), and three nuclear genes, 18S rRNA (18S), 28S rRNA (28S), and histone H3 (H3), were sequenced for newly collected *Solenysa* specimens. Laboratory protocols and sequence curation follow those described in Wang et al. [4]. The primers and their annealing temperatures used for PCR amplification in the present study are provided in Table 1.


**Table 1.** Primer sequences, their sources, and reaction conditions used for PCR in this study.

#### *2.3. Phylogenetic Analyses*

Sequences for the five genes were aligned using MAFFT 7.490 [38] and were concatenated using Mesquite version 3.31 [39] in the order of 16S, 18S, 28S, COI, and H3. Maximum likelihood (ML) phylogenetic analysis was carried out using IQ-TREE v2.1.3 [40] with best-fit DNA substitution models selected using ModelFinder [41]. ML trees were inferred from (1) the concatenated super-matrix with a single overall the best-fit model and (2) the partitioned matrix defined by loci using PartitionFinder [42] with each partition applying its best-fit model. The GTR + F + R5 was selected as the best-fit model for the concatenated super-matrix. The models selected for partitions of the best scheme were: TIM2 + F + I + G4 for partition 16S, SYM + R4 for partition 18S + 28S, GTR + F + I + G4 for both partition COI and partition H3. Node supports were assessed through the ultrafast bootstrap method [43] with 1000 replicates, incurring the -bnni option to reduce the risk of overestimating branch supports. We also performed Bayesian phylogenetic inference based on the best partition scheme using MrBayes v3.2.6 [44]. We used the GTR model for the 16S data in place of the TIM2 model, as the latter is not implemented in MrBayes. The SYM model for the 18S + 28S data was converted from the GTR model by fixing the stationary state frequencies to be equal. MrBayes analyses were initiated with random starting trees employing four Markov chains (one cold and three hot). The Markov chains ran for 2 × 106 generations with trees and parameters being sampled every 100 generations. The "temperature" parameter was set to 0.2. The chains were converged and reached a stationary state after the iteration with the average standard deviation of split frequencies being smaller than 0.0074, and all values of potential scale reduction factor for all parameters being very close to 1.00. The majority-rule consensus tree was generated using the sample from the cold chain after the first 25% of the sample was discarded as burn-in. The topology of phylogeny inferred from the present study was statistically tested for robustness against alternative topologies with the approximately unbiased (AU) test [45].

#### *2.4. Estimation of Divergence Times*

The divergence times were estimated with a relaxed molecular clock approach implemented in BEAST2 version 2.6.6 [46]. The rate change was explicitly modeled using uncorrelated lognormal distribution across trees and a birth–death model was used for modeling

speciation. The best-fit DNA substitution model was selected using bModelTest module [47] in BEAUti2. Two independent MCMC searches were run for 8 × 107 generations with trees and parameters being sampled every 1000 generations. The convergence of the MCMC chains was checked with Tracer version 1.7.1 [48]. The first 10% samples were discarded as burn-in.

Two calibration points were used: (1) the oldest linyphiid fossil from the Lower Cretaceous Lebanese amber and (2) the divergence between two endemic Hawaiian species *Orsonwelles polites* and *Orsonwelles malus*. The oldest linyphiid fossil was originally described as an undetermined linyphiid [16]. Several studies have used it as a calibration point based on different assumptions: as a stem linyphiid, a crown linyphiid, a crown clade containing all linyphiids except *Stemonyphantes* [7,8,19,20,49,50], or even as a stem araneoid [51]. Herein, following Arnedo and Hormiga [7], we assigned the age of the fossil to the most recent common ancestor (MRCA) of Linyphiidae and applied an exponential prior with a mean of 10.0 and offset = 125.0 for this calibration, which gave a 95% confident interval of 125–155 Myr. According to Hormiga et al. [52], the Hawaiian spiders *Orsonwelles malus* is endemic to Kaui Island (formed 5.1 million years ago), and *Orsonwelles polites* is endemic to the adjacent O'ahu Island (formed 2.6–3.7 million years ago). We assigned a normal prior for the divergence time between these two species with a mean of 3.0 Myr and standard deviation of 0.5 Myr, following Arnedo and Hormiga [7].

#### *2.5. Lineages Tracing through Time*

We use lineages through time (LTT) plot to gain insight into the history of diversification for clades S (*Solenysa*), A, and B. Samples of dated genealogies for each of these three clades were inferred using BEAST2. The times of the MRCA for the relevant clades came from the dated phylogeny. The LTT plots were generated using Tracer version 1.7.1 [48].

#### **3. Results**

#### *3.1. DNA Sequence Data*

A total of 590 sequences were obtained. Sequences for all five genes were acquired for 88 taxa (68.22%; 88/129), and at least four genes were acquired for the majority (96.06%; 122/127). Fragments from 16S, 18S, 28S, COI, and H3 were sequenced for the taxa sampled here are 91.34% (116/127), 97.64% (124/127), 81.89% (104/127), 95.28% (121/127), and 98.43% (125/127), respectively. After alignment, the concatenated matrix includes a total of 2678 sites. All newly acquired sequences have been deposited in GenBank. The accession numbers of all samples are listed in Table 2.

**Table 2.** Taxon list and sequence information. An asterisk (\*) following species names indicate that they are newly sequenced for the present study.


**Table 2.** *Cont.*



**Table 2.** *Cont.*

#### *3.2. Phylogeny of Solenysa*

The ML trees inferred from the concatenated super-matrix and from the best partition scheme and the Bayesian tree are highly congruent as far as the major clades we concern in the present study. All analyses recovered seven major clades equivalent to those found in previous studies [4]. Given the ML estimate obtained from the partitioned analysis was significantly better than from the concatenated analysis, (log-likelihoods, −45494.13 vs. −46688.08), we reported the result from the partitioned analysis in Figure 2 (see Supplementary Material Figures S1 and S2 for node supports in ML tree and Bayesian tree). All linyphiid taxa formed a monophyletic lineage sister to the pimoid clade and were grouped into seven strongly supported (bootstrap support/Bayesian posterior = 100/1.00), named as clades A, B, C, D, E, F, and S (*sensu* Wang et al. [4]). The monophyly of *Solenysa* (clade S, 100/1.00) and its sister relationship with the species-rich clade B (100/1.00) were robustly supported (100/1.00). Relationships among the seven major lineages remain the same as those of the analyses of Wang et al. [4]. One major conflict between the phylogeny we inferred here and the phylogeny reported by [8] involved relationship between *Putaoa hauaping* and the *Stemonyphantes* species (clade E). In our results, *Putaoa hauaping* and the *Stemonyphantes* species failed to form a monophyletic clade. However, our phylogeny was strongly supported by the AU test (*p* = 0.99).

The thirteen *Solenysa* taxa were further divided into four clades (Figure 2): the *longqiensis* clade (clade L) including *S*. *longqiensis* and *S*. *yangmingshana* (68/0.81); the *wulingensis* clade (clade W) including *S*. *wulingensis*, *S*. *retractilis*, and *S*. *tianmushana* (99/1.00); the *mellotteei* clade (clade J) including six Japanese species, *S*. *mellotteei*, *S*. *reflexilis*, *S*. *trunciformis*, *S*. *macrodonta*, *S*. *partibili*, and *S*. *ogatai* (100/1.00); and the *protrudens* clade (clade P), including *S*. *protrudens* and *S*. *lanyuensis* (100/1.00). Compositions of the four clades were largely congruent with the four clades recognized in phylogenetic analysis based on morphological data, each of which was supported by several synapomorphies [12]. However, the relationships among these clades in our results were: the *longqiensis* clade was the most basal lineage sister to all other *Solenysa* clades (100/1.00); the *mellotteei* clade was sister to the *protrudens* clade (100/1.00), and the clade (*mellotteei* + *protrudens*) was sister to the *wulingensis* clade (95/1.00).

Mapping the characters of the epigynum with an extensible base and the desmitracheate system onto the phylogenetic framework show that the movable epigynum independently evolved multiple times in clades A and S. The *Solenysa* clade was distantly related with the taxon of Ipainae (*Wubanoides* sp.) and those micronetines also having a movable epigynum. The desmitracheate system independently evolved in clade B multiple times.

Distributions of the four *Solenysa* clades have different patterns (Figure 1). All species of Clade J are limited to the Japanese Archipelago; the species of clades L and P are known from the southeast coast of China and Taiwan Island; those of clade W scatter in southern China, as well as the Korean Peninsula (not sampled here). All *Solenysa* species have a disjunct distribution, and all the materials studied here were collected from leaf litter with high ambient humidity.

#### *3.3. Divergence Times Estimation*

Divergence times of the relevant nodes within linyphiids estimated using a relaxed molecular clock method in BEAST are provided in Table 3. The dated phylogeny is shown in Figure 3, with branches of the other six major clades collapsed and the complete chronogram is presented in Supplementary Material Figure S3.

**Table 3.** Divergence times of relevant nodes within linyphiids estimated using a relaxed molecular clock method in BEAST.


TMRCA of the oldest linyphiids fossil: Exponential prior (Mean = 10, Offset = 125). TMRCA of *Orsonwelles polites* and *Orsonwelles malus*: Normal prior (Mean = 3.0 mya, SD = 0.5). Time in million years ago (mya). Nodes labeled as in Figure 3. HPD, highest posterior density; TMRCA, time of most recent common ancestor.

The chronogram of linyphiids suggests that all the seven major lineages survived the K-T boundary. The MRCA of all linyphiids (node 1) can be traced back to the early Cretaceous (128.80 mya, 95% HPD, 125.14–136.36 mya). Clade S and clade B (node 6) diverged around 79.29 (67.69–90.67) mya in the Cretaceous. Further diversifications in the linyphiid lineages largely took place after the K-T boundary, except for the *Solenysa* clade. The earliest split of *Solenysa* species (node 7) can only be traced back to 28.62 (20.09–37.98) mya in the middle Oligocene. In the following 10 Myr until the early Miocene, all of the four *Solenysa* clades (nodes 8–10) have appeared. The speciation of extant species (nodes 11–17) took place largely in the middle Miocene and the Pliocene, ranging between 12.34 (6.29–18.98) mya to 2.56 (0.99–4.33) mya, before the onset of the Quaternary climatic oscillations (2.6 mya–present [53,54]). Two exceptions involved the divergence of Japanese species, between *S*. *partibilis* and *S. ogatai* (node 18), which occurred much recently, around 0.83 (0.02–1.94) mya and between *S*. *macrodonta* and *S. reflexilis* (node 16), around 2.52 (1.12–4.07) mya during the Quaternary glaciations. In contrast, diversification in clades A and B crossed the K-T boundary and continued during the whole Cenozoic.

#### *3.4. Temporal Patterns of Linyphiids Diversification*

Analysis of lineage accumulation over time using LTT plots suggested that species of clade A and clade B began to accumulate long before the K-T boundary, while the lineage number in clade S (*Solenysa*) did not change until the Oligocene, long after the K-T boundary (Figure 4). Both clade S and clade B experienced obvious increases in the

accumulation of lineages during the Oligocene. Besides, clade S experienced a unique fast lineage-accumulating phase during the last 10 Myr.

**Figure 3.** Chronogram of linyphiids. Branches and taxon names in color show the four clades within *Solenysa*. Branches of other six major clades collapsed. In the complete chronogram shown in microimage, two red dots indicate calibration points. Numbers in parentheses after clade names refer to the sampling numbers of the clades and species numbers represented, respectively. Numbers above branches label the divergence nodes of the seven major clades within linyphiids and internal nodes within *Solenysa*. Values below branches show divergence time of nodes, and node bars show confidence intervals. Three grey belts refer to K-T boundary, early and middle Oligocene, and middle Miocene and Pliocene, respectively.

**Figure 4.** Temporal patterns of linyphiid lineage diversifications. a, stepped line in color indicate species number of lineages vary through time for *Solenysa* clade (brown), clade A (purple), and clade B (green). The shadowed areas in the same color show 95% confidence intervals.

#### **4. Discussion**

For a long time, a temporal framework was lacking for linyphiids, the second-largest group of spiders (but see [7,8]). Through phylogenetic reconstruction and molecular dating based on sequence data of five genes for all major groups of linyphiids, we brought a time scale to this important spider group and gained some illuminating insights into the evolutionary history that gave rise to the poor species diversity of *Solenysa* spiders contrasted with its sister group. We further use LTT plots to demonstrate the history of *Solenysa* diversification.

*Solenysa* spiders originated in the first radiation of linyphiids and missed the second burst of speciation in this group. The chronogram shows that the MRCA of linyphiids can be traced back to the early Cretaceous (Figure 3); it might even be traced to the Jurassic [20]. All the seven major clades, including the *Solenysa* clade, emerged during the Cretaceous. Being generalist predators in natural ecosystems, linyphiid spiders weave horizonal sheet webs to catch prey [1]. Their web-building level varies a lot across the seven major clades (Figure 2): most taxa of clade C + D, traditionally grouped in Linyphiinae, build aerial webs at various levels of vegetation, especially often at the crown level, while the taxa of clade ME mainly formed by species of Micronetinae and Erigoninae build their webs much closer to the ground. Generally, those micronetines with dorsal spots on their abdomen usually build aerial webs at the leaf-litter surface in forests, and those of clade B, especially those erigonines having an abdomen in grey to dark black without dorsal spots, build substrate-webs close to or even on the ground. *Solenysa* spiders, as members of clade ME, inhabit the litter layer in forests and have a grayish abdomen without dorsal spots; they build their webs close to the ground (see the figure in [31]). These suggest that linyphiids had their first diversification, as in many other spiders, accompanied by the co-adaptative radiation of insects and angiosperms in the Cretaceous [1,11,12,55], and diversification among the seven clades were accompanied by the divergence of their web-building levels. Furthermore, the subsequent diversification within these major clades mainly flourished in the Cenozoic in general. This indicates that the species diversity of all seven clades was significantly affected by the mass extinction of the K-T boundary (65 mya, [56]); their diversifications in the Cenozoic was, in fact, the second radiation in linyphiid evolution. Unlike other major clades, diversification in the *Solenysa* clade has a long-time lag of more than 50 Myr in their early history. The single lineage crossed the K-T boundary, with the earliest split occurring 28.62 mya in the middle Oligocene, much later than its sister group clade B, and their sister group clade A (Figure 4), although we expanded the sampling of *Solenysa* taxa in the present study.

The low species diversity of *Solenysa* spiders might result from both the lower diversification rate than that of their sister group clade B and the long-time lag in their evolutionary history. Among all the major clades of linyphiids, only in clade B did the desmitracheate system independently evolve multiple times (Figure 2). Although its selective advantages, either the high efficiency of anteriorly extending tracheae in providing oxygen directly to the brain and legs [35,36] or the assistance of extensively branched tracheae on water retention [9,37], have never been physiologically tested in web-weaver spiders (but see the morphological test in [57]), such a tracheal system repeatedly evolved in clade B, as well as in several litter-dweller spider groups [14,58,59], which implies its selective advantage for these spiders. Furthermore, the great species diversity of clade B, especially of the distal erigonines clade, suggest that the desmitracheate system might be a key innovation that triggered the fast diversification in clade B. Accordingly, without driving by the selection advantages of the desmitracheate system, it is not a surprise that the diversification rate of *Solenysa* clade is not as fast as that of clade B. Nevertheless, this is not the only reason attributed to the extreme asymmetry in the species diversity between the *Solenysa* clade and clade B. The long time lag across the K-T boundary to the middle Oligocene makes the *Solenysa* clade have no origination of any further extant groups for more than half of the common historical time shared with clades A and B (Figure 4).

Being highly dependent on high-humidity environments may be a major limit impact on *Solenysa* spiders. Unlike book lungs, tracheal respiration in spiders does not depend on hemolymph [36,60], and intensive branched tracheae are helpful in saving water [9,37]. Nevertheless, the intermediate-type tracheal system in *Solenysa* has the unbranched median pair tracheae extending into the prosoma [24]. Furthermore, their living habits are usually in areas with high ambient humidity. Although building sheet-webs at litter level, *Solenysa* spiders failed to leave forests for more open and less humid ecosystems, such as grasslands, or even as pioneers to colonize the ecological bare grounds as those erigonines of their sister group. Accordingly, we infer that their extreme dependence on high environmental humidity is the main constraint for their distribution.

Our results show the temporal patterns of linyphiid lineage diversification were closely related to global climate changes (Figures 3 and 4). Although the Paleocene Earth was commonly considered ice-free and the global climate was warm and humid [61,62], evidence has shown that the climate in Asia was dry during most of the Paleogene. There was an arid belt that existed from the western-most part to the eastern coasts, and arid and semi-arid conditions dominated in large areas of China [63]. This led to the formation of temperate grasslands and savannah ecosystems on most land at the expense of forest decline [61,62,64]. Such ecosystem transformations might have acted as a driving force that promoted the rapid diversification of micronetines (clade A) and erigonines (clade B) but might be a major constraint on the survival of *Solenysa* species. Nevertheless, such a dry belt retreated northwestward from the Eocene to the Oligocene [63]. Until the late Oligocene (28–24 mya), the southeast part of China, including the southeast coasts and Taiwan Island, became humid. This might have triggered the diversification of extant *Solenysa* spiders (28.62 mya, Figure 3). In the following Miocene (24–5.3 mya) and the Pliocene (5.3–2.6 mya), the humid belt further expanded northwards, the whole eastern part of China transformed into humid conditions. During this time, clade S experienced a unique fast lineage-accumulating phase. Therefore, such climate changes might have released the constraint from arid conditions and promoted the diversification of *Solenysa* in the Neogene.

*Solenysa* spiders display typical characters of a relict group, with low species diversity and narrow distribution [65–67]. The chronogram of linyphiids shows that most *Solenysa* species have emerged before the onset of the Quaternary glaciations (2.6 mya, [53,54]). However, all of the 15 known species have a disjunct distribution, most of them being only known from type localities that fall near the supposed Pleistocene glacial refugia in East Asia (Figure 1; [68–72]). This suggests that the refugia have played an important role in maintaining these *Solenysa* species during the glacial period. The dramatically cold climate during the glacial period would generally incur contractions of the distribution range [65,66,73–75]. Such an interpretation may partially explain the current disjunct distribution patterns of *Solenysa* species. However, it also implies that *Solenysa* species have failed to expand their distributional ranges as the climate became warmer in the post-glacial periods. Generally, linyphiids, as well as most other spiders, are capable of dispersal by balloon [60]. Nevertheless, long-distance dispersal by ballooning means spiders staying a long time in the air without a water supply. The survival of *Solenysa* spiders depends extremely on highly humid environments that not only constrain their distribution but also limit their capacity for long-time ballooning for dispersal, especially when suitable habits are segmentized. Accordingly, the current distribution pattern of *Solenysa* spiders might be shaped by both the locations of those refugia they survived during the Quaternary glaciations and their weak dispersal capacity.

#### **5. Conclusions**

Our results suggest that *Solenysa* is a Cretaceous relict group, having survived the mass extinctions at the K-T boundary and the ecosystem transition caused by the global climate changes in the Cenozoic. Its diversification was shaped by the climatic oscillations in the Cenozoic. The low species diversity of the *Solenysa* clade, in contrast to its sister group, is largely due to the long time lag in its early evolutionary history. Given *Solenysa* represents a strongly supported major clade in linyphiid multi-locus phylogeny and is supported by several synapomorphies, it warrants a subfamilial status in Linyphiidae.

**Supplementary Materials:** The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/d14020120/s1, Table S1: Collecting information of the *Solenysa* spider sequenced in the present study; Figure S1: Bayesian tree with all nodes supports; Figure S2: ML tree with all nodes supports; Figure S3: The complete chronogram.

**Author Contributions:** Conceptualisation, J.T., C.S., and L.T.; methodology, J.T., Y.Z., C.S., and L.T.; acquisition, J.T., H.O., L.T.; formal analysis, J.T., Y.Z., C.S., and L.T.; investigation, all authors; writing original draft preparation, J.T., C.S., and L.T.; writing—review and editing, all authors; funding acquisition, L.T., and C.S. All authors have read and agreed to the published version of the manuscript.

**Funding:** Please add: This research was funded by the National Natural Sciences Foundation of China (grant Nos. 31572244, 31872188 and 31772435).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The new genetic sequences are available on GenBank (ac-cession codes: OL693166 to OL693169 (COI), OL691622 to OL691623 (16S), OL691624 to OL691627 (18S), OL691628 to OL691631 (28S), OL702837 to OL702840 (H3)) and also as a supplement to this paper.

**Acknowledgments:** We thank J. Fu and the anonymous reviewers for their critical comments on the early versions of this manuscript. We also thank F. Wang, F. Zhang, S. Tian., Z. Zhao, and R. Li for their kind assistance in field work, and A. Andoh and A. Tanikawa for kindly providing *Solenysa* material collected from Japan.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**

