Next Article in Journal
X-Linked Emery–Dreifuss Muscular Dystrophy: Study Of X-Chromosome Inactivation and Its Relation with Clinical Phenotypes in Female Carriers
Previous Article in Journal
The Molecular Determination of Hybridity and Homozygosity Estimates in Breeding Populations of Lettuce (Lactuca sativa L.)
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Novel Software and Method for the Efficient Development of Polymorphic SSR Loci Based on Transcriptome Data

Northwest A&F University, State Key Laboratory of Crop Stress Biology for Arid Areas, Key Laboratory of Integrated Pest Management on Crops in Northwestern Loess Plateau, Ministry of Agriculture and Rural Affairs, Yangling 712100, China
*
Author to whom correspondence should be addressed.
Genes 2019, 10(11), 917; https://doi.org/10.3390/genes10110917
Submission received: 17 October 2019 / Revised: 4 November 2019 / Accepted: 5 November 2019 / Published: 11 November 2019
(This article belongs to the Section Technologies and Resources for Genetics)

Abstract

:
Traditional methods for developing polymorphic microsatellite loci without reference sequences are time-consuming and labor-intensive, and the polymorphisms of simple sequence repeat (SSR) loci developed from expressed sequence tag (EST) databases are generally poor. To address this issue, in this study, we developed a new software (PSSRdt) and established an effective method for directly obtaining polymorphism details of SSR loci by analyzing diverse transcriptome data. The new method includes three steps, raw data processing, PSSRdt application, and loci extraction and verification. To test the practicality of the method, we successfully obtained 1940 potential polymorphic SSRs from the transcript dataset combined with 44 pea aphid transcriptomes. Fifty-two SSR loci obtained by the new method were selected for validating the polymorphic characteristics by genotyping in pea aphid individuals. The results showed that over 92% of SSR loci were polymorphic and 73.1% of loci were highly polymorphic. Our new software and method provide an innovative approach to microsatellite development based on RNA-seq data, and open a new path for the rapid mining of numerous loci with polymorphism to add to the body of research on microsatellites.

1. Introduction

The increasing progress of next generation sequencing (NGS) has promoted the explosion of transcriptome data, providing large-scale essential data for the application of molecular markers [1,2,3]. Microsatellites, also called simple sequence repeats (SSRs) or short tandem repeats (STRs), are among the most popular markers, and have been widely used in the analysis of population genetics due to such advantages as wide distribution, high polymorphism, and satisfactory repeatability [4,5,6,7,8]. The applications of SSR markers can obtain abundant and reliable experimental data based on the hypermutation of SSRs. However, traditional methods of polymorphic SSR isolation and characterization without reference sequences are expensive, time-consuming and labor-intensive [9,10]. Many researchers have attempted to optimize and streamline microsatellite experiments [11,12,13,14,15]. For instance, the application of multiplex PCR can significantly reduce the time and cost of SSR genotyping [11], and the success rate of tests can be increased by referring to the propositions concerning the primer design of SSR loci [12]. In addition, to reduce the risk of failure, SSR loci over a certain repeat time were presumed to be polymorphic for follow-up research, resulting in large loci with fewer repeats being overlooked [16].
Even though SSR loci obtained from whole genome data are generally polymorphic [17,18], the genomes of many species have not been sequenced. Compared to genomic-SSR analysis, the development of SSRs based on RNA-seq data has also become a mature and commonly employed method due to its low cost [19,20,21,22,23]. However, not all of the microsatellites tend to mutate, many SSR loci may not show polymorphisms among individuals, especially the loci in the gene encoding region excavated from transcriptome data, as a result, SSR mining using this traditional method is not always efficient [19,23,24,25,26,27]. Additionally, although some studies obtained many polymorphic SSR loci from transcriptome data, few of these loci showed high polymorphism [19,28,29,30]. For instances, during the SSR development in Sander lucioperca based on its RNA-seq data, Han et al. (2016) found that 18% of SSR loci were polymorphic and only one locus (1%) was highly polymorphic [28]. Li et al. (2017) found 15 polymorphic SSR loci and five highly polymorphic loci in the 55 test loci using the Casuarina equisetifolia transcriptome [29]. The indel analysis of whole genome re-sequencing can provide large scale data to obtain microsatellite mutation information [31,32,33,34]. However, due to the high costs, genome re-sequencing is not always available or unpractical. Additionally, because the software for indel analysis, such as Samtools or GATK [35], is not specifically designed for SSR analysis, the detection of SSRs using this type of program might have a higher error rate and the whole process requires some additional fresh script. Similar situations are encountered in transcriptome dataset analysis. Some indel analysis software can be used for RNA-seq data [35,36,37], none of which has been optimized for SSR detection alone.
In the present study, a new method for simplifying polymorphic SSR site screening using RNA-seq data is proposed. In brief, the concise method primarily consists of three stages: raw data processing, software development and implementation, and the extraction and checking of sequences. A new Perl-based open-source software developed in this study is the key component for the task, which efficiently processes the dataset of multiple transcriptomes. Providing a clear experimental range for polymorphic site mining, the new software can obtain numerous unverified polymorphic SSR loci. Additionally, the software can reflect the polymorphism details of most SSR loci, meaning that the SSR loci with high polymorphism can be further selected and many SSRs with few repeats will not be neglected. However, the method is generally unable to be used for species with transcriptomes less than five. We verify the practicability of the method in pea aphid (Acyrthosiphon pisum) individuals by genotyping.

2. Materials and Methods

2.1. Data Collection

The RNA-seq of pea aphids were used in this study. A total of 44 pea aphid transcriptomes were downloaded from National Center for Biotechnology Information (NCBI) SRA database (https://www.ncbi.nlm.nih.gov/sra/). The RNA-seq data was submitted by nine different institutions including the University of Arizona [38], University of Nebraska-Lincoln, Baylor College of Medicine [39], Centro Nacional de Análisis Genómico, Cornell University, Yale University [40], French National Institute for Agricultural Research, Gene Expression Omnibus [41], and National Institute for Basic Biology (Table 1).

2.2. Development of a New Software

To establish a novel method for screening polymorphic SSR loci using multiple transcriptomes, we first developed one software (Polymorphic SSR digging tool, PSSRdt, publicly available at https://github.com/PSSRdt/program). This program is based on Perl and available in the input files in FASTA format, which is compatible with multiple systems including Linux and Windows. The usage of PSSRdt is consistent with the regular Perl scripts and is also listed in the software manual. The program primarily depends on the data primitiveness of transcripts assembled using the de novo assembly method. PSSRdt first seeks out SSRs from the input file. The criteria of SSR detection refer to microsatellite searching program-MISA (available online at http://pgrc.ipk-gatersleben.de/misa/misa.html) (Table S1). Unlike some specialized software for SSR identification such as MISA or SSRLocator [42], this program does not need to distinguish whether SSRs are perfect (with single simple repeats) or imperfect [43]. SSR motifs screened with their flanking sequences are then recorded as the hash ‘keys’ and will be assessed as one SSR locus for the moment, the repeat number of which will be logged as their hash ‘values’. The various SSR loci excavated from the input file combined by multiple transcriptomes can be classified and their repeat details will be quickly saved and listed. Thus, if only a few RNA-seq data are available, such as one to five, the digitals in ‘values’ will be not enough for analysis. In addition, the length of each flanking sequence is determined by the judgment of the users according to thespecific data assembled quality (usually over eight to improve the accuracy). PSSRdt needs to call the scripts in Bioperl (available online at https://bioperl.org/). Users should first ensure the setup success of Bioperl modules, where the common installation methods of the modules are listed in the software manual.

2.3. Screening and Verifying of SSR Obtained by PSSRdt

After running PSSRdt, users will obtain two output files containing the details of total detected SSRs and unverified polymorphic ones respectively (Figure 1 Step 2). Both files consist of three column contents. Each row represents the details of all detected SSRs in the assumed same SSR locus, which includes, in turn, the SSR motif with its flanking sequences of that locus, the sum of all SSRs in the locus and the repeat number of each SSR in that locus. The highly variable numbers of repeat motifs shown in the third column indicate the site with relatively high polymorphism. Therefore, users could select the loci with many different repeat numbers to improve the efficiency of subsequent tests. Based on SSRs with the flanking sequences shown in ‘FindStr.result’, users can quickly verify the correctness of polymorphism information and obtain the complete sequences of these loci using a string search function in any text editor. Meanwhile, users could check SSR loci carefully and extract the applicable sites in the same way. The most important items the users need to check are listed in Step 3 of Figure 1, including estimating whether the length of flanking sequences of SSR motifs meets criteria for primer design (the flanking sequences of Sample 1 could fill the primer design, while Sample 2 could not) and the consistency of all sequences in assumed same loci (the flanking sequences in Sample 4 are consistent with corresponding sequences in Sample 3, which are identified as the same SSR locus. Sample 5 and 3 are not the same locus).

2.4. Overall Flow of the Novel Method

A novel method was constructed to efficiently excavate potential polymorphic microsatellite loci using multiple transcriptomes, which was divided into three steps (Figure 1). First, in Step 1, different transcriptomes of an organism are collected and downloaded. After quality control and data filtering of raw data, de novo assembly software is applied to the high-quality clean reads. All transcripts assembled in FASTA format are then merged into one dataset, which will be calculated by PSSRdt in Step 2. Two output files will be immediately generated, including the details of all excavated SSR loci (called ‘FindStr.result.detail’) and the potential polymorphic sites (called ‘FindStr.result’). Users can select some potential polymorphic SSR loci from the ‘FindStr.result’ file. In Step 3, SSRs with complete flanking sequences can be simply and visually extracted using the details of these SSR loci by the text editors and the verification of sequence information accuracy can be performed simultaneously.

2.5. Experimental Validation

2.5.1. RNA-Seq Data Assembly

To intuitively present and test the method, the experiments were conducted with pea aphids as samples. The raw reads of pea aphid transcriptomes were filtered to generate clean reads by removing adapter sequences, low-quality reads (quality scores < 30), reads with unknown bases ‘N’ and < 30 bp reads. All raw reads were assembled into transcripts using Trinity [44], the short reads assembling program. We integrated all 44 transcriptomes assembled into one dataset after de novo assembly. The ‘cat’ command on Linux was used to realize this step. The dataset was then computed by PSSRdt (parameter: 10) as an input file. Sublime Text 3 was performed to check the SSR details generated by PSSRdt and extract 100–300 bp sequences containing SSRs (each flanking sequence ≥ 50 bp).

2.5.2. Samples and Primers

Pea aphid individuals were used for validation of the SSR polymorphism. These individuals are randomly taken from five populations originated from different locations and were fed by dactylethrae with broad bean plants in phytotron. The cultivation conditions were set to temperature 24 ± 1 °C, relative humidity 60% ± 10%, photoperiod L:D = 16 h:8 h.
We tested 52 SSR loci with high polymorphisms in the ‘FindStr.result’ file for validation by genotyping. SSR primers were designed by PRIMER3 [45] and the options for primer design refer to the following: (1) primer lengths ranging from 18 to 27 bp; (2) product sizes are 100–300 bp; (3) melting temperature (Tm) is 57 °C to 63 °C and the difference of Tm between forward and reverse primers < 2 °C; (4) GC content 40–60%, with an optimal value of 50%. If no primers were found with these options, the parameter range was adjusted: Tm 45–63 °C; the difference between forward and reverse primers < 5 °C; GC content 30–70%. The study referred to the microsatellite PCR fragment fluorescent labeling method [46]. Three PCR primers are needed to amplify each microsatellite locus by this method. The first primer was the 5′-end of the forward primers (F) adding M13 forward primer (5’-TGTAAAACGACGGCCAGT-3’); the second primer was the reverse primer (R) without any modification; and the third primer was a M13 forward primer labeled by 6-carboxy-fluorescine (FAM) at its 5’-end.

2.5.3. PCR Amplification and Statistical Analysis

We extracted genomic DNA from individual A. pisum sample using the Easy Pure Genomic DNA kit (TransGen Biotech Co., Ltd., Beijing, China). Five individuals were randomly selected from each of the five populations for DNA extraction. Because of the small body size, the DNA extracted from a single pea aphid sample can only be used for the verification of 15 pairs of primers. Therefore, after the DNA was used up, we took another five individuals from each of the population for the next 15 primers, amounting to 25 pea aphid individuals from each population were used for validation of the 52 SSR loci selected. The DNA extracted was tested by 1.0% agarose gel electrophoresis to estimate the quality. Each 25 μL PCR included 1.5 μL pea aphid genomic DNA (the concentration of the primers 10–30 ng/μL), 12.5 μL of TransGen Biotech Taq MasterMix, 0.5 μL forward primer (10 μM), 2.0 μL reverse primer (10 μM), M13 primer 2.0 ul (10 μM). The PCR amplification conditions for microsatellite loci were as follows: DNA initial denaturation at 94 °C for 2 min; 30 cycles of 94 °C for 20 s, annealing temperature of specific primer for 20 s, 72 °C for 20 s; 8 cycles of 94 °C for 30 s, 53 °C for 45 s (special annealing temperature of primer M13), 72 °C for 45 s, and a final step 72 °C for 10 min.
The PCR products of microsatellite DNA were detected by 2% agarose gel electrophoresis. The products containing target bands were sent to Sangon Biotech for microsatellite genotyping detection. The PCR products with fluorescent labels were taken for fluorescence detection using capillary electrophoresis method in an ABI3730XL DNA automated analyzer (Applied Biosystems, Foster City, CA, USA). The obtained peak electrophoregrams were converted into the fragment length of amplified products using GENEMAPER v4.0 software (Applied Biosystems, Foster City, CA, USA). Lastly, the number of alleles (Na), polymorphism information content (PIC) and other analyses for evaluating polymorphism of SSRs were conducted by PowerMarker v3.25 [47].

3. Results

3.1. Transcriptome Data Assembly and Microsatellite Detection

To intuitively present the overall processes of the proposed method, all results of verification experiments on pea aphids are provided. A total of 44 representative pea aphid transcriptomes were downloaded from the NCBI SRA public database (Table 1). Nine institutions submitted the RNA-seq data. Using the data, we obtained 3,387,696 to 126,263,308 raw reads, and 2,829,317 to 108,646,204 high quality clean reads were then generated by data filtering for next de novo assembly (Table S2). Running Trinity, the various pea aphid transcriptomes were assembled into 7634 to 68,321 transcripts and the total sizes of sequences ranged from 2,076,016 bp to 42,402,758 bp (Table 1). SSR analysis of transcriptomes using MISA is also exhibited in Table 1. In addition, the number of trinucleotide repeat motifs in all transcriptomes exceeded the dinucleotide repeats but were less than the mononucleotides (Figure 2a). SSR motifs of AT/AT, AG/CT, and AAT/ATT were dominant in dinucleotide and trinucleotide microsatellites.

3.2. PSSRdt Application

The dataset (3137 Mb, millions of base pairs) merged by 44 RNA-seq assembled data contained 1,089,298 transcripts. PSSRdt were executed on Intel i7-4770 1600 MHz with 4 Gb RAM, running on Windows 7. The transcript set in FASTA format was processed in 14 min and produced two documents, which included a total of 16,384 SSR loci (Supplementary File S1) and 1940 potential polymorphic loci (Supplementary File S2) respectively. The analysis of the ‘FindStr.result’ file in the Supplementary File 2 revealed that the mononucleotide repeats were predominant, reaching 1097 (56.55%). There were 710 (36.60%) more dinucleotide SSR motifs more than trinucleotide SSRs (133, 6.86%) (Figure 2b), which was not inconsistent with the quantity relationship shown in Figure 2a. The highest proportions in mononucleotide and dinucleotide repeats were A/T (1074, 55.33%) and AT/AT (444, 22.87%) respectively. Among trinucleotide SSRs, the AAT/ATT type was most abundant (108, 5.56%).

3.3. Primer Design and SSR Loci Validation

Primer sequences are showed in Table S3. The statistics of polymorphism for the 52 SSR loci are presented in Table 2. The number of pea aphids successfully genotyped ranged from 9 to 21 (average 15.211). The number of SSR alleles per locus was 1 to 12, with an average of 5.346. Only four SSR loci failed to show polymorphism among the samples of five geographical populations (Table 2), indicating that over 92% of loci are polymorphic. The average PIC in 52 loci is 0.575. The PIC of 84.6% (44) and 73.1% (38) loci exceeded 0.25 (reasonably informative) and 0.5 (highly informative), respectively (Table 2). The PIC values (0–0.25) in the seven types of motifs were under 30% apart from AAT/ATT and CCG/CGG. The percentage of most SSR motifs’ PIC, which exceeded 0.5, was not less than 50% (Figure 3). There is no significant difference in the PIC ratio between dinucleotide and trinucleotide SSRs.

4. Discussion

With the rapid increase in popularity of NGS, an increasing number of transcriptomes are being uploaded on public databases [48,49], signifying that the method using RNA-seq data to excavate SSR polymorphism information will acquire more data support and can be applied to additional species. This method contained complete flows for analysis of transcriptome data, providing an effective path to reduce the workload of biologist. The classical biology experiments to excavate polymorphic SSR loci, such as magnetic beads enrichment and 5′ anchored PCR method, are not easy and inevitably require large workloads and high costs [50,51]. The new idea, that researchers directly obtain polymorphic information from RNA-seq data, may bring about significant progress in the study of SSRs.
For the sake of controlling cost and workload, researchers were previously prone to using SSRs with more repeats because of the probability of SSR loci with fewer repeats showing high polymorphism might be unsatisfactory, resulting in a number of SSRs being ignored [10]. The proposed method directly excavates the polymorphism details to avoid the loss. During the test on pea aphids, the repeated number of SSR motifs in outputs from PSSRdt was not emphasized. Among the 52 loci tested, the repeat number of nine dinucleotide SSR motifs was entirely less than 12, and eleven trinucleotide repeats were all below 10 in the transcript set, and most of those loci were polymorphic (8, 88.9%; 11, 100%). There were six (66.7%) and seven (63.6%) dinucleotide and trinucleotide SSRs that had a PIC over 0.5. It is believed that researchers could acquire more polymorphic SSR loci for the analysis of population genetics when they adopt this method on the basis of those particular results. In addition, multiple sequences at the same SSR loci can be extracted from the sets of diverse transcriptomes, which provides more complete sequences for the design of PCR primers and reduces the impact of assembly errors.
PSSRdt can rapidly complete microsatellite detection and produce two different files, thereby helping users manage different issues and simplify workload. However, if part of the flanking sequences at the same locus of SSRs is mutated, the microsatellite loci with mutations will not be matched with others. Thus, missing a few potential polymorphic loci is unavoidable. Besides, the principle of the program regarding the identification of SSRs is based on whether the number of tandem repeats above certain thresholds; therefore, it is unable to distinguish between perfect and imperfect SSRs. Fortunately, the minimum repeat time of the imperfect SSRs is very close to or not lower than the thresholds [52,53], thereby leaving the imperfect loci generally unburied.
Although, transcriptome assembly is a complex task and has certain requirements for server hardware. In this study, we downloaded 44 pea aphid transcriptomes submitted in SRA database of A. pisum, and analyzed the backgrounds of RNA-seq data, including the experimental objectives and methods, the attributes of the samples, and the submitted institutions. In fact, there is no need to download and assemble all transcriptomes of research objects when the amount of data is abundant. Many cDNA libraries for RNA-seq have multiple duplications, and researchers can choose part of the raw data to save time. In addition, many factors can generate a large influence on microsatellite alteration, such as long-term pesticide treatment and extreme temperature. Thus, using more representative data that samples through different types of treatments can improve the possibility of polymorphic loci mining.
Many new polymorphic SSR loci were obtained in pea aphids and the subsequent experiment verified the efficiency of the method. This approach can be used for more species with sufficient transcriptome data. Numerous new SSR loci with polymorphisms in various species will be found and some research concerning microsatellites may be extended with the data support, for instance, the distribution rules and structural features of SSRs with polymorphisms in functional regions of genes, the influences of external environment on SSR mutations, and characteristics of SSR alleles among different species [54,55,56]. Moreover, examining the vast new loci among diverse species might be valuable to researchers studying the laws of SSR mutations during biological evolution [57].

5. Conclusions

In this study, a novel software and method were presented to efficiently excavate polymorphic SSR loci from RNA-seq data and tested on A. pisum. This concise method includes three stages: raw data processing, program development and application, and loci extraction and verification. The method provides a clear range for polymorphic loci mining and the experiment success rate was high compared with the traditional methods using RNA-seq data. PSSRdt was especially designed for SSR detection, which was better than the indel analysis software for SSR studies. The novel method provides a new path for rapidly screening numerous polymorphic SSR loci and abundant data for further studies of SSRs.

Supplementary Materials

The following are available online at https://www.mdpi.com/2073-4425/10/11/917/s1, Table S1: The criteria for SSR detection, Table S2: Statistics of RNA-seq raw reads and clean reads after data filtering, Table S3: Characteristics of 52 microsatellite loci developed for A. pisum, File S1 and File S2 are the two output files generated by PSSRdt (findStr.result and findStr.result.detail).

Author Contributions

R.T. conceived and designed the experiments, and R.T. and M.C. wrote the manuscript; X.P., X.G., Y.H., and C.Z. participated in the related experiments.

Funding

This work was supported by the National Natural Science Foundation of China (No. 31772160) and The Coordinated Research Project (CRP) (grant No. 20559) of the International Atomic Energy Agency.

Acknowledgments

We thank those who help us in sample collection and technical assistance.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Han, Z.; Ma, X.; Wei, M.; Zhao, T.; Zhan, R.; Chen, W. SSR marker development and intraspecific genetic divergence exploration of Chrysanthemum indicum based on transcriptome analysis. BMC Genom. 2018, 19, 291. [Google Scholar] [CrossRef] [PubMed]
  2. Ukoskit, K.; Posudsavang, G.; Pongsiripat, N.; Chatwachirawong, P.; Klomsa-Ard, P.; Poomipant, P.; Tragoonrung, S. Detection and validation of EST-SSR markers associated with sugar-related traits in sugarcane using linkage and association mapping. Genomics 2019, 1, 1–9. [Google Scholar] [CrossRef] [PubMed]
  3. Jia, H.; Yang, H.; Sun, P.; Li, J.; Zhang, J.; Guo, Y.; Han, X.; Zhang, G.; Lu, M.; Hu, J. De novo transcriptome assembly, development of EST-SSR markers and population genetic analyses for the desert biomass willow, Salix psammophila. Sci. Rep. 2016, 6, 39591. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  4. Xu, Z.; Primavera, J.H.; Pena, L.D.D.L.; Pettit, P.; Belak, J.; Alcivar-Warren, A. Genetic diversity of wild and cultured Black Tiger Shrimp (Penaeus monodon) in the Philippines using microsatellites. Aquaculture 2001, 199, 13–40. [Google Scholar] [CrossRef]
  5. Lázaro-Castellanos, J.O.; Mata-Rosas, M.; González, D.; Arias, S.; Reverchon, F. In Vitro propagation of endangered Mammillaria genus (Cactaceae) species and genetic stability assessment using SSR markers. In Vitr. Cell. Dev. Biol. Plant 2018, 54, 518–529. [Google Scholar] [CrossRef]
  6. Yue, G.H.; Mei, Y.H.; Orban, L.; Komen, J. Microsatellites within genes and ESTs of common carp and their applicability in silver crucian carp. Aquaculture 2004, 234, 85–98. [Google Scholar] [CrossRef]
  7. Ding, S.; Wang, S.; Kang, H.; Jiang, M.; Fei, L. Large-scale analysis reveals that the genome features of simple sequence repeats are generally conserved at the family level in insects. BMC Genom. 2017, 18, 848. [Google Scholar] [CrossRef] [PubMed]
  8. Yadav, R.; Lone, S.A.; Gaikwad, K.; Singh, N.K.; Padaria, J.C. Transcriptome sequence analysis and mining of SSRs in Jhar Ber (Ziziphus nummularia (Burm. f.) Wight & Arn) under drought stress. Sci. Rep. 2018, 8, 2406. [Google Scholar] [PubMed]
  9. Squirrell, J.; Hollingsworth, P.M.; Woodhead, M.; Russell, J.; Lowe, A.J.; Gibby, M.; Powell, W. How much effort is required to isolate nuclear microsatellites from plants? Mol. Ecol. 2003, 12, 1339–1348. [Google Scholar] [CrossRef] [PubMed]
  10. Guichoux, E.; Lagache, L.; Wagner, S.; Chaumeil, P.; Léger, P.; Lepais, O.; Lepoittevin, C.; Malausa, T.; Revarel, E.; Salin, F.; et al. Current trends in microsatellite genotyping. Mol. Ecol. Resour. 2011, 11, 591–611. [Google Scholar] [CrossRef] [PubMed]
  11. Chamberlain, J.S.; Gibbs, R.A.; Ranier, J.E.; Nguyen, P.N.; Caskey, C.T. Deletion screening of the Duchenne muscular dystrophy locus via multiplex DNA amplification. Nucleic Acids Res. 1988, 16, 11141–11156. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  12. Emese, M.; Nicolas, P.; André, G.; Vincent, D.; Pascal, H.; Aurélie, T.; Rémi, G.; Jean-François, M. QDD version 3.1: A user-friendly computer program for microsatellite selection and primer design revisited: Experimental validation of variables determining genotyping success rate. Mol. Ecol. Resour. 2015, 14, 1302–1313. [Google Scholar]
  13. Mudunuri, S.B.; Patnana, S.; Nagarajaram, H.A. MICdb3.0: A comprehensive resource of microsatellite repeats from prokaryotic genomes. Database 2014, 2014, bau005. [Google Scholar] [CrossRef] [PubMed]
  14. Faircloth, B. MSATCOMMANDER: Detection of microsatellite repeat arrays and automated, locus-specific primer design. Mol. Ecol. Resour. 2010, 8, 92–94. [Google Scholar] [CrossRef] [PubMed]
  15. Ponyared, P.; Ponsawat, J.; Tongsima, S.; Seresangtakul, P.; Akkasaeng, C.; Tantisuwichwong, N. ESAP plus: A web-based server for EST-SSR marker development. BMC Genom. 2016, 17, 163–173. [Google Scholar] [CrossRef] [PubMed]
  16. Christian, S. The evolution of molecular markers—Just a matter of fashion? Nat. Rev. Genet. 2004, 5, 63–69. [Google Scholar]
  17. Wang, Y.W.; Samuels, T.D.; Wu, Y.Q. Development of 1030 genomic SSR markers in switchgrass. Theor. Appl. Genet. 2011, 122, 677–686. [Google Scholar] [CrossRef] [PubMed]
  18. Contreras, J.L.; Knoppers, B.M. The genomic commons. Annu. Rev. Genom. Hum. Genet. 2018, 19, 429–453. [Google Scholar] [CrossRef] [PubMed]
  19. Dutta, S.; Kumawat, G.; Singh, B.P.; Gupta, D.K.; Singh, S.; Dogra, V.; Gaikwad, K.; Sharma, T.R.; Raje, R.S.; Bandhopadhya, T.K.; et al. Development of genic-SSR markers by deep transcriptome sequencing in pigeonpea [Cajanus cajan (L.) Millspaugh]. BMC Plant Biol. 2011, 11, 17. [Google Scholar] [CrossRef] [PubMed]
  20. Zhang, J.; Liang, S.; Duan, J. De novo assembly and Characterisation of the Transcriptome during seed development, and generation of genic-SSR markers in Peanut (Arachis hypogaea L.). BMC Genom. 2012, 13, 90. [Google Scholar] [CrossRef] [PubMed]
  21. Ma, S.; Dong, W.; Lyu, T.; Lyu, Y. An RNA sequencing transcriptome analysis and development of EST-SSR markers in Chinese hawthorn through Illumina sequencing. Forests 2019, 10, 82. [Google Scholar] [CrossRef]
  22. Liu, C.; Fan, B.; Cao, Z.; Qiuzhu, S.U.; Wang, Y.; Zhang, Z.; Jing, W.U.; Tian, J. A deep sequencing analysis of transcriptomes and the development of EST-SSR markers in mungbean (Vigna radiata). J. Genet. 2016, 95, 527–535. [Google Scholar] [CrossRef] [PubMed]
  23. Xu, M.; Liu, X.; Wang, J.W.; Teng, S.Y.; Shi, J.Q.; Li, Y.Y.; Huang, M.R. Transcriptome sequencing and development of novel genic SSR markers for Dendrobium officinale. Mol. Breed. 2017, 37, 18. [Google Scholar] [CrossRef]
  24. Eujayl, I.; Sorrells, M.E.; Baum, M.; Wolters, P.; Powell, W. Isolation of EST-derived microsatellite markers for genotyping the A and B genomes of wheat. Theor. Appl. Genet. 2002, 104, 399–407. [Google Scholar] [CrossRef] [PubMed]
  25. Saha, M.C.; Cooper, J.D.; Mian, M.A.R.; Konstantin, C.; May, G.D. Tall fescue genomic SSR markers: Development and transferability across multiple grass species. Theor. Appl. Genet. 2006, 113, 1449–1458. [Google Scholar] [CrossRef] [PubMed]
  26. Shuxian, L.I.; Yin, T.M. Map and analysis of microsatellites in the genome of Populus: The first sequenced perennial plant. Sci. China 2007, 50, 690. [Google Scholar]
  27. Dang, M.; Liu, Z.X.; Chen, X.; Zhang, T.; Zhou, H.J.; Hu, Y.H.; Zhao, P. Identification, development, and application of 12 polymorphic EST-SSR markers for an endemic Chinese walnut (Juglans cathayensis L.) using next-generation sequencing technology. Biochem. Syst. Ecol. 2015, 60, 74–80. [Google Scholar] [CrossRef]
  28. Han, X.; Ling, Q.; Li, C.; Wang, G.; Xu, Z.; Lu, G. Characterization of pikeperch (Sander lucioperca) transcriptome and development of SSR markers. Biochem. Syst. Ecol. 2016, 66, 188–195. [Google Scholar] [CrossRef]
  29. Li, N.; Zheng, Y.Q.; Ding, H.M.; Li, H.P.; Peng, H.Z.; Jiang, B.; Li, H.B. Development and validation of SSR markers based on transcriptome sequencing of Casuarina equisetifolia. Trees 2017, 32, 1–9. [Google Scholar] [CrossRef]
  30. Wong, Q.; Tanzi, A.; Ho, W.; Malla, S.; Blythe, M.; Karunaratne, A.; Massawe, F.; Mayes, S. Development of Gene-Based SSR Markers in Winged Bean (Psophocarpus tetragonolobus (L.) DC.) for Diversity Assessment. Genes 2017, 8, 100. [Google Scholar] [CrossRef] [PubMed]
  31. Wan, F.; Yin, C.; Tang, R.; Chen, M.; Wu, Q.; Huang, C.; Qian, W.; Rota-Stabelli, O.; Yang, N.; Wang, S.; et al. A chromosome-level genome assembly of Cydia pomonella provides insights into chemical ecology and insecticide resistance. Nat. Commun. 2019, 10, 4237. [Google Scholar] [CrossRef] [PubMed]
  32. Bhangale, T.R.; Stephens, M.; Nickerson, D.A. Automating resequencing-based detection of insertion-deletion polymorphisms. Nat. Genet. 2006, 38, 1457–1462. [Google Scholar] [CrossRef] [PubMed]
  33. Dotsev, A.V.; Sermyagin, A.A.; Gladyr′, E.A.; Deniskova, T.; Wimmers, K.; Reyer, H.; Brem, G.; Zinovieva, N.A. 163 Population structure and genetic diversity of Russian native cattle breeds. J. Anim. Sci. 2017, 95, 80. [Google Scholar] [CrossRef]
  34. Farhat, M.R.; Freschi, L.; Calderon, R.; Ioerger, T.; Snyder, M.; Meehan, C.J.; de Jong, B.; Rigouts, L.; Sloutsky, A.; Kaur, D.; et al. GWAS for quantitative resistance phenotypes in Mycobacterium tuberculosis reveals resistance genes and regulatory regions. Nat. Commun. 2019, 10, 2128. [Google Scholar] [CrossRef] [PubMed]
  35. Li, H.; Handsaker, B.; Wysoker, A.; Fennell, T.; Ruan, J. The sequence alignment-map format and SAMtools. Bioinformatics 2009, 25, 2078–2079. [Google Scholar] [CrossRef] [PubMed]
  36. Wang, T.-Y.; Wang, L.; Alam, S.; Hoeppner, L.; Yang, R. ScanNeo: Identifying indel-derived neoantigens using RNA-Seq data. Bioinformatics 2019, 35, 4159–4161. [Google Scholar] [CrossRef] [PubMed]
  37. Hagiwara, K.; Ding, L.; Edmonson, M.N.; Rice, S.V.; Newman, S.; Meshinchi, S.; Ries, R.E.; Rusch, M.; Zhang, J. RNAIndel: A machine-learning framework for discovery of somatic coding indels using tumor RNA-Seq data. bioRxiv 2019, 10, 512749. [Google Scholar] [CrossRef]
  38. Burke, G.R.; Moran, N.A. Responses of the pea aphid transcriptome to infection by facultative symbionts. Insect Mol. Biol. 2011, 20, 357–365. [Google Scholar] [CrossRef] [PubMed]
  39. Richards, S.; Gibbs, R.A.; Gerardo, N.M.; Moran, N.; Nakabachi, A.; Richards, S.; Stern, D.; Tagu, D.; Wilson, A.C.; Muzny, D. Genome sequence of the pea aphid Acyrthosiphon pisum. PLoS Biol. 2015, 8, e1000313. [Google Scholar]
  40. Hansen, A.K.; Moran, N.A. Aphid genome expression reveals host-symbiont cooperation in the production of amino acids. Proc. Natl. Acad. Sci. USA 2011, 108, 2849–2854. [Google Scholar] [CrossRef] [PubMed]
  41. Purandare, S.R.; Bickel, R.D.; Jaquiery, J.; Rispe, C.; Brisson, J.A. Accelerated evolution of morph-biased genes in pea aphids. Mol. Biol. Evol. 2015, 31, 2073. [Google Scholar] [CrossRef] [PubMed]
  42. da Maia, L.C.; Palmieri, D.A.; de Souza, V.Q.; Kopp, M.M.; de Carvalho, F.I.; de Oliveira, A.C. SSR Locator: Tool for simple sequence repeat discovery integrated with primer design and PCR simulation. Int. J. Plant Genom. 2008, 2008, 412696. [Google Scholar] [CrossRef] [PubMed]
  43. Urquhart, A.; Kimpton, C.P.; Downes, T.J.; Gill, P. Variation in short tandem repeat sequences—A survey of twelve microsatellite loci for use as forensic identification markers. Int. J. Leg. Med. 1994, 107, 13–20. [Google Scholar] [CrossRef] [PubMed]
  44. Grabherr, M.G.; Haas, B.J.; Moran, Y.; Levin, J.Z.; Thompson, D.A.; Ido, A.; Xian, A.; Lin, F.; Raktima, R.; Qiandong, Z. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat. Biotechnol. 2011, 29, 644. [Google Scholar] [CrossRef] [PubMed]
  45. Rozen, S.; Skaletsky, H. Primer3 on the WWW for general users and for biologist programmers. Methods Mol. Biol. 2000, 132, 365–386. [Google Scholar] [PubMed]
  46. Schuelke, M. An economic method for the fluorescent labeling of PCR fragments. Nat. Biotechnol. 2000, 18, 233–234. [Google Scholar] [CrossRef] [PubMed]
  47. Liu, K.; Muse, S. PowerMarker: An integrated analysis environment for genetic marker analysis. Bioinformatics 2005, 21, 2128–2129. [Google Scholar] [CrossRef] [PubMed]
  48. Yin, C.; Shen, G.; Guo, D.; Wang, S.; Li, F. InsectBase: A resource for insect genomes and transcriptomes. Nucleic Acids Res. 2016, 44, 801–807. [Google Scholar] [CrossRef] [PubMed]
  49. Ip, J.C.H.; Mu, H.; Chen, Q.; Sun, J.; Ituarte, S.; Heras, H.; Van Bocxlaer, B.; Ganmanee, M.; Huang, X.; Qiu, J.-W. AmpuBase: A transcriptome database for eight species of apple snails (Gastropoda: Ampullariidae). BMC Genom. 2018, 19, 179. [Google Scholar] [CrossRef] [PubMed]
  50. Kandpal, R.P.; Kandpal, G.; Weissman, S.M. Construction of libraries enriched for sequence repeats and jumping clones, and hybridization selection for region-specific markers. Proc. Natl. Acad. Sci. USA 1994, 91, 88–92. [Google Scholar] [CrossRef] [PubMed]
  51. Fisher, P. Single locus microsatellites isolated using 5′ anchored PCR. Nucleic Acids Res. 1996, 24, 4369–4371. [Google Scholar] [CrossRef] [PubMed]
  52. Mudunuri, S.B.; Nagarajaram, H.A. IMEx: Imperfect microsatellite extractor. Bioinformatics 2007, 23, 1181–1187. [Google Scholar] [CrossRef] [PubMed]
  53. Alam, C.M.; Singh, A.K.; Sharfuddin, C.; Ali, S. Genome-wide scan for analysis of simple and imperfect microsatellites in diverse carlaviruses. Infect. Genet. Evol. 2014, 21, 287–294. [Google Scholar] [CrossRef] [PubMed]
  54. Jo, W.-S.; Kim, H.-Y.; Kim, K.-M. Development and characterization of polymorphic EST based SSR markers in barley (Hordeum vulgare). 3 Biotech 2017, 7, 265. [Google Scholar] [CrossRef] [PubMed]
  55. Liu, S.; An, Y.; Li, F.; Li, S.; Wei, C. Genome-wide identification of simple sequence repeats and development of polymorphic SSR markers for genetic studies in tea plant (Camellia sinensis). Mol. Breed. 2018, 38, 59. [Google Scholar] [CrossRef]
  56. Tiwari, J.K.; Ali, N.; Devi, S.; Zinta, R.; Chakrabarti, S.K. Analysis of allelic variation in wild potato (Solanum) species by simple sequence repeat (SSR) markers. 3 Biotech 2019, 7, 262. [Google Scholar] [CrossRef] [PubMed]
  57. Vinces, M.D.; Legendre, M.; Caldara, M.; Hagihara, M.; Verstrepen, K.J. Unstable tandem repeats in promoters confer transcriptional evolvability. Science 2009, 324, 1213–1216. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Flow diagram of the method for polymorphic SSR loci mining. Step 1 indicates the procedure of RNA-seq raw data processing. Two examples represent the characteristics of the two output files generated by PSSRdt on the left of Step 2. (a) and (b) correspondingly represent total screened SSR loci and potential polymorphic loci. Two main validation items for the information check of potential polymorphic SSR are listed in Step 3. The flanking sequences in Sample 1 fill the requirements of primer design, while those in Sample 2 do not. The flanking sequences in Sample 4 are consistent with the corresponding flanking sequences in Sample 3, which are identified as the same SSR locus, while Sample 5 is not.
Figure 1. Flow diagram of the method for polymorphic SSR loci mining. Step 1 indicates the procedure of RNA-seq raw data processing. Two examples represent the characteristics of the two output files generated by PSSRdt on the left of Step 2. (a) and (b) correspondingly represent total screened SSR loci and potential polymorphic loci. Two main validation items for the information check of potential polymorphic SSR are listed in Step 3. The flanking sequences in Sample 1 fill the requirements of primer design, while those in Sample 2 do not. The flanking sequences in Sample 4 are consistent with the corresponding flanking sequences in Sample 3, which are identified as the same SSR locus, while Sample 5 is not.
Genes 10 00917 g001
Figure 2. The number of SSRs in A. pisum based on motif types. (a) The number of total types of SSR motifs in the transcript dataset. (b) The number of SSR motifs with potential polymorphism analyzed by PSSRdt.
Figure 2. The number of SSRs in A. pisum based on motif types. (a) The number of total types of SSR motifs in the transcript dataset. (b) The number of SSR motifs with potential polymorphism analyzed by PSSRdt.
Genes 10 00917 g002
Figure 3. Polymorphism information content (PIC) details of SSR loci tested on A. pisum. The percentages of different PIC values in 9 types of SSR motifs, dinucleotide microsatellites, and trinucleotide microsatellites were visualized by three colors.
Figure 3. Polymorphism information content (PIC) details of SSR loci tested on A. pisum. The percentages of different PIC values in 9 types of SSR motifs, dinucleotide microsatellites, and trinucleotide microsatellites were visualized by three colors.
Genes 10 00917 g003
Table 1. Summary of transcriptome assembly and simple sequence repeat (SSR) analysis.
Table 1. Summary of transcriptome assembly and simple sequence repeat (SSR) analysis.
Accession IDTotal Sequences aTotal Size (bp)Sequences with SSRs bTotal SSRs1 c23456Submission Institution
SRR0637061584,72811,540871931499,2997918603011102319The University of Arizona
SRR0637072667,39125,1553736537029471034135121143The University of Arizona
SRR06440885483035,49514932138985508630942Yale University
SRR0644092422,23274,382072301175,368771456333763137Yale University
SRR07134795583108,73211961653663300680622Baylor College of Medicine
SRR0731361014,53367,825148120821122376574721University of Nebraska-Lincoln
SRR0732721888,52141,43285184873754231090215352136University of Nebraska-Lincoln
SRR07327476342076,01640751415173283610University of Nebraska-Lincoln
SRR0732763775,91020,013421222766114337512231654University of Nebraska-Lincoln
SRR3535392789,84192,91441014,82241,81363,729435646143409University of Nebraska-Lincoln
SRR0734264643,71260,277226623487159745214101954Cornell university
SRR0735732073,01504,9227321245412745593117118104National Institute for Basic Biology
SRR0735742164,68890,34124783269140544713862254National Institute for Basic Biology
SRR0735752001,82192,999144217262345399727584095National Institute for Basic Biology
SRR0735761979,11799,6584361157722199872264640105National Institute for Basic Biology
SRR0735881668,32225,060964101291,776071742347273203National Institute for Basic Biology
SRR0742312336,24099,62121021,92242,01476,327454715148418University of Nebraska-Lincoln
SRR0742332133,82352,602974171472,392371897346393285University of Nebraska-Lincoln
SRR0758022376,33008,166983631792,71085,323914544108256INRA d
SRR0758033108,73924,71891047,22068,11362,024704414127437INRA
SRR0978963299,33999,762699931783,01139,92139415097369Centro Nacional de Análisis Genómico
SRR0983303110,83628,876999811905,81276,022543898104357Centro Nacional de Análisis Genómico
SRR12394393280,93742,64151015,91920,71267,722724090119409Gene Expression Omnibus
SRR12394402037,33664,042578281561,6898720734421103248Gene Expression Omnibus
SRR12394411687,11151,23992089285916543658201523Gene Expression Omnibus
SRR12394421581,12480,953160331218,175321470308474165Gene Expression Omnibus
SRR12394431571,62352,585462121250,679431556291868183Gene Expression Omnibus
SRR12394441232,71797,2772441879315175913178345114Gene Expression Omnibus
SRR12394451376,82219,22065276984864451139218663105Gene Expression Omnibus
SRR12394466832,12486,027271851092,466481428275968156Gene Expression Omnibus
SRR12394486799,52526,114974541140,170181473282166167Gene Expression Omnibus
SRR12394492097,32681,82395932942759141066236958137Gene Expression Omnibus
SRR12394506334,71765,826039575229246477019542894Gene Expression Omnibus
SRR12394513222,48346,763151519137862568561032Gene Expression Omnibus
SRR12394522073,015049,227321245412745593117118104Gene Expression Omnibus
SRR12394532080,98469,79523823158127044114162344Gene Expression Omnibus
SRR17932992384,44240,27581037,12327,61495,4289852161553914Cornell university
SRR17933002042,43121,842279381492,9977118453179100277Cornell university
SRR9241063001,73667,781097721859,31207,122754082115446INRA
SRR9241182568,13458,279884211534,397961877355284268INRA
SRR9241192478,03590,046888101695,91087,920453888108318INRA
SRR9241201589,62747,126657551011,262941202253563117INRA
SRR9241211600,22609,734165191377,284061733351193236INRA
SRR9241221445,52071,969457581126,073361348249760163INRA
a The number of transcripts assembled by Trinity; b The number of sequences containing SSRs; c Mononucleotide SSRs; d French National Institute for Agricultural Research.
Table 2. Polymorphism analysis of 52 microsatellite loci in pea aphid individuals.
Table 2. Polymorphism analysis of 52 microsatellite loci in pea aphid individuals.
LocusNNAFMPIC
31250.41670.5748
41860.52780.6194
52070.42500.7164
61860.33330.7444
71430.67860.4090
81790.20590.8313
92160.42860.7006
101260.29170.7517
131690.28130.8122
141880.44440.7118
151611.00000.0000
161670.43750.7081
171530.83330.2604
182180.23810.8207
191450.46430.6469
211620.62500.3589
221150.50000.6257
231160.31820.7436
271740.61760.5239
291360.34620.6874
3111110.22730.8595
331620.93750.1103
341230.45830.5697
351840.47220.5851
381740.61760.5269
391780.26470.7888
402011.00000.0000
411070.35000.7700
431280.29170.8013
461750.32350.7130
471611.00000.0000
481370.42310.6867
491620.87500.1948
511450.32140.7248
522040.45000.5249
531460.39290.7072
1011530.76670.3227
10215120.20000.8685
1081270.29170.7614
1091630.84380.2478
1101560.43330.6675
1121790.23530.8319
1131530.83330.2710
1141590.36670.7762
116930.72220.3709
117970.27780.8053
1191280.25000.7957
1211850.27780.7429
1221720.88240.1861
1281150.31820.7319
1311850.30560.7165
1321811.00000.0000
Mean15.21155.34620.49660.5751
N, Number of aphids successfully genotyped; NA, Number of alleles per locus; FM, Frequency of major allele; PIC, Polymorphism information content.

Share and Cite

MDPI and ACS Style

Tian, R.; Zhang, C.; Huang, Y.; Guo, X.; Chen, M. A Novel Software and Method for the Efficient Development of Polymorphic SSR Loci Based on Transcriptome Data. Genes 2019, 10, 917. https://doi.org/10.3390/genes10110917

AMA Style

Tian R, Zhang C, Huang Y, Guo X, Chen M. A Novel Software and Method for the Efficient Development of Polymorphic SSR Loci Based on Transcriptome Data. Genes. 2019; 10(11):917. https://doi.org/10.3390/genes10110917

Chicago/Turabian Style

Tian, Ruizheng, Cunhuan Zhang, Yixiao Huang, Xin Guo, and Maohua Chen. 2019. "A Novel Software and Method for the Efficient Development of Polymorphic SSR Loci Based on Transcriptome Data" Genes 10, no. 11: 917. https://doi.org/10.3390/genes10110917

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop