Development and application of perfect SSR markers in cotton

This study aimed to develop a set of perfect simple sequence repeat (SSR) markers with a single copy in the cotton genome, to construct a DNA fingerprint database suitable for authentication of cotton cultivars. We optimized the polymerase chain reaction (PCR) system for multi-platform compatibility and improving detection efficiency. Based on the reference genome of upland cotton and 10× resequencing data of 48 basic cotton germplasm lines, single-copy polymorphic SSR sites were identified and developed as diploidization SSR markers. The SSR markers were detected by denaturing polyacrylamide gel electrophoresis (PAGE) for initial screening, then fluorescence capillary electrophoresis for secondary screening. The final perfect SSR markers were evaluated and verified using 210 lines from different sources among Chinese cotton regional trials. Using bioinformatics techniques, 1 246 SSR markers were designed from 26 626 single-copy SSR loci. Adopting a stepwise (primary and secondary) screening strategy, a set of 60 perfect SSR markers was selected with high amplification efficiency and stability, easy interpretation of peak type, multiple allelic variations, high polymorphism information content (PIC) value, uniform chromosome distribution, and single-copy characteristics. A multiplex PCR system was established with ten SSR markers using capillary electrophoresis detection. A set of perfect SSR markers of cotton was developed and a high-throughput SSR marker detection system was established. This study lays a foundation for large-scale and standardized construction of a cotton DNA fingerprint database for authentication of cotton varieties.


Background
Cotton is one of the most important economic crops in the world. Its fiber is a renewable textile raw material, and its seeds can be used for the production and processing of cottonseed oil. In recent years, excellent new varieties have been developed and promoted, and the seed market has flourished. At the same time, brand infringement and the production and sale of fake and inferior seeds have continued to occur, and this has severely dampened the enthusiasm of innovators, increased the risks for farmers, and affected agricultural production safety, which has severely restricted the development of China's seed industry (Wu et al. 2015). Variety identification can effectively ensure the quality of seeds in agricultural production and reduce the risk of using fake and/or inferior seeds. Among DNA molecular fingerprinting methods, simple sequence repeat (SSR) molecular marker technology is widely used in rice (Sundaram et al. 2008;Singh et al. 2004), wheat (Zhu and Jia 2003;Shi et al. 2006), maize (Wang et al. 2003(Wang et al. , 2017, soybean (Guan et al. 2003;Sheng et al. 2010), rape (Li et al. 2010) and other major crops because of its advantages such as high polymorphism, codominant inheritance, and easy statistics (Guichoux et al. 2011). There have numerous reports on SSR molecular detection technology in cotton (Pan et al. 2008;Kuang et al. 2011a, b;Wang 2009;Wang et al. 2009;Bai et al. 2012;Liu et al. 2009). However, in previous studies, most of the SSR primers were from public databases. The quality of primers was uneven, and the number of screening and evaluation materials tested was relatively small. Using the existing methods, it is difficult to meet the needs of identification, standardization, and large-scale detection of cotton varieties currently planted in China. Wang et al. first proposed the concept of excellent SSR markers in maize and provided several important evaluation indicators: (1) based on the development of resequencing data, the primer flanking sequence should be highly conserved; (2) suitable markers can effectively remove abnormal peak shapes such as N + 1 peaks and continuous peaks; (3) multiplex polymerase chain reaction (PCR) can be combined to improve detection efficiency; (4) data integration should be available using different platforms (Wang et al. 2016). For cotton cultivars, the complexity of the polyploid genome (Yang 2001;Wolfe 2001) and the reproductive characteristics of cross-pollination pose difficulties for the development of high-quality markers and accurate genotyping among different varieties.
In recent years, with the completion of the cotton reference genome sequence (Wang et al. 2012;Li et al. 2014;Zhang et al. 2015), the foundation has been laid for the development and application of high-quality markers for the whole genome. In this study, we used 10× resequencing data from 48 upland cotton genotypes. Concerning the evaluation index of excellent SSR markers in maize (Wang et al. 2016), combined with the polyploid characteristics of cotton, a batch of single-copy SSR markers was developed. Denaturing polyacrylamide gel electrophoresis and fluorescent capillary electrophoresis were used for primary screening and re-screening of primers, respectively. We successfully developed a set of single-copy excellent SSR primers that combines high efficiency and stability, easy-to-read peak shapes, multiple allelic variations, high polymorphism information content (PIC) values, and uniform chromosome distribution. Combining multiplex PCR amplification technology and capillary electrophoresis detection enabled a tenfold capillary electrophoresis procedure. In this way, a high-throughput SSR marker detection system was established, laying the foundation for constructing a large-scale database of DNA fingerprints for cotton variety identification and authentication.

Experimental materials
This study used 689 cotton cultivars, including 395 core germplasm lines (Ma et al. 2018), 210 lines from national trials in different regions, 48 basic germplasm lines, and 12 sets of triplet materials comprising male parent, female parent, and F 1 generation (Table 1).

SSR primers
The common primers used in this study were synthesized by Dalian Bao Biological Company, China, and the fluorescent primers were synthesized by Applied Biosystems, USA. For each pair of fluorescent primers, one of them was selected to be labeled with a fluorescent dye PET, NED, VIC, or FAM at the 5′ end.

DNA extraction
Genomic DNA was extracted from dry seeds of each cultivar using the method described by Kuang et al. (2010).
Amplification procedure: pre-denaturation at 94°C for 5 min; denaturation at 94°C for 45 s, annealing at 60°C for 45 s, extension at 72°C for 45 s, for a total of 32 cycles; extension at 72°C for 7 min.

Denaturing polyacrylamide gel electrophoresis (PAGE)
Denatured PCR products were separated on a 6% polyacrylamide gel and electrophoresed at 90 W for 1 h and then stained with silver nitrate (Kuang et al. 2011b) .

Fluorescence capillary electrophoresis
A mixture of 1 μL PCR amplification product, 8.5 μL deionized formamide and 0.5 μL Liz-500 molecular standard was denatured at 95°C for 5 min, cooled at 4°C for 10 min, and then separated using an ABI 3130XL gene analyzer. The fluorescent capillary electrophoresis detection procedure was preelectrophoresis at 15 kV for 3 min, 2 kV injection for 2 s, electrophoresis at 15 kV for 20 min.

Statistical analysis
The numbers of alleles, genetic diversity, and polymorphism information content (PIC) of SSR molecular markers were analyzed using PowerMarker 3.25 software. The discrimination ability of SSR primers is expressed by the PIC value: the larger the PIC value, the stronger the discrimination ability. The formula is: PIC = 1 − ΣP i 2 , where P i refers to the frequency of the i-th allelic variation. Cluster analysis was performed on the similarity coefficient matrix and genetic distance matrix using the unweighted pair-group method with arithmetic means (UPGMA).

Screening of single-copy SSR
Bioinformatics methods were used to evaluate 10× resequencing data of 48 cotton genotypes, and 26 266 single-copy SSR loci were obtained. Of these, 5 712 loci with conserved flanking sequences were suitable for designing SSR primers. Analyzing the number of alleles of each SSR primer sequence in 48 cotton germplasm lines, we found that the more alleles, the stronger ability of the primers to identify different varieties. At the same time, the smaller amplified product, the higher efficiency of PCR amplification, which is more beneficial for constructing a multiplex PCR system. Considering the above conditions and the chromosome distribution of the SSR primers, 1 246 pairs were selected from the 5 712 loci for further screening.

Preliminary screening by denaturing PAGE platform
For preliminary screening, 48 germplasm lines from three cotton-growing regions were used to screen 1 246 pairs of primers by denaturing polyacrylamide gel electrophoresis. A total of 398 polymorphic primers were screened and were distributed across the 26 chromosomes. Among them, 145 primers had high amplification efficiency, good reproducibility, clear main bands, and had no non-specific amplification, and single bands when tested in known cultivars. Some example electrophoresis bands are shown in Fig. 1.

Rescreening by fluorescence capillary electrophoresis
For secondary screening, 395 cotton core germplasm lines (Ma et al. 2018) representing a broad range of genotypes were used to analyze the allelic variation of 145 pairs of candidate SSR primers (selected from the preliminary screening) using fluorescence capillary electrophoresis as the detection platform. Twelve sets of triplets comprising male parent, female parent, and F 1 generation were used as materials to analyze the heterozygosity of the 145 pairs of candidate SSR primers in the hybrids. From the results of capillary electrophoresis, we selected primers with sharp single peaks, no interference peaks, no continuous peaks, and stable and easy-to-read peak shapes. Combining the above indicators, 60 pairs of excellent SSR primers with high allelic variation, strong discrimination ability, uniform chromosome distribution, high heterozygosity in hybrids, and easy-to-read peak shape were selected.

Polymorphism analysis of SSR markers
The selected 60 excellent SSR primers were all diploid single-copy primers and were evenly distributed on the 26 chromosomes of cotton A and D genomes. The primer amplification was efficient and stable. In the capillary electrophoresis detection platform, 88.3% gave sharp single peaks with easy-to-read peak shape. There are many allelic variations among the primers, the 60 pairs of excellent SSR primers contained 247 single-copy polymorphic alleles among 431 cotton lines. Each primer contained an average of 4.12 allelic variations, with a range of 2-13. Among them, primer PC07 had the most allelic variations, with 13 mutations, indicating that the site has a high frequency of mutation and rich genetic diversity. The average PIC value was 0.48, with a range of 0.2-0.8. The results of 12 sets of triplets showed that there were 24 pairs of primers with heterozygosity greater than 50%; among them, primer PC06 had the highest heterozygosity at 75%. Primer information is shown in Table 2.

Construction of a multiplex PCR system and ten-fold capillary electrophoresis
Primer Premier 5.0 software was used to analyze the interactions between the primers. Concerning the range of amplified fragments, 60 pairs of primers were divided into 18 combinations for multiplex PCR amplification. According to the different fluorescent colors of the primers, six groups of ten-fold primer combinations were constructed for capillary electrophoresis detection (Table 3).
Taking cotton variety Zhongmiansuo 49 as an example, the results of six ten-fold capillary electrophoresis detection are shown in Fig. 2. Ten pairs of SSR primers in each combination were labeled with four types of fluorescence. All the primers showed stable amplification, no interference between primers, clear bands and sharp peaks, making the results easy to interpret.

Application of perfect SSR markers in the construction of a DNA fingerprint database of cotton varieties
The selected 60 pairs of primers and the multiple combination method described above were used to construct fingerprints of 210 lines from national trials in different regions from 2016 to 2018. PowerMaker 3.25 software was used to calculate the genetic distance using Nei's 1973 algorithm to construct a UPGMA clustering map. As shown in Fig. 3, samples No. 1-97 came from the Yellow River Basin; samples No. 98-146 came from the Yangtze River Basin; samples No. 147-210 came from the northwest inland region. The cluster analysis showed that: (1) the perfect SSR primer combinations were able to identify all 210 regional trial varieties; (2) regional trial samples from the same cotton-growing area were preferentially grouped.

Discussion
In recent years, molecular marker detection technologies have been continuously developed. Among such technologies, SSR and SNP molecular markers were written into the molecular testing guidelines by the International Union for the Protection of New Varieties of Plants (UPOV) in 2005 (UPOV 2005). At the same time, relatively mature SSR molecular markers have become the first choice for constructing DNA fingerprint databases of crops. The diploid crop maize is in a leading position in the construction of a standard fingerprint database of varieties using SSR molecular markers. The SSR fingerprint database and management system for certification of maize varieties have been constructed and applied to the national seed inspection system in China (Wang et al. 2017). The selection of candidate markers and screening materials In the case of cotton, the lack of genomic information in the early stage led to difficulty in the development of SSR primers, and the evaluation indexes for screening markers were insufficient. In the past, cotton SSR   marker detection was mainly based on polyacrylamide gel electrophoresis, and there was no standardized operating procedure, which made it difficult to store and analyze the marker detection data efficiently, and the consistency of different laboratories was poor. Unlike diploid crops such as maize and rice, cultivated cotton is mostlytetraploid, so a single SSR marker may amplify into a single band, or two, three, or even four bands in different cotton varieties. At the same time, the breeding characteristic of frequent cross-pollination makes it difficult for cotton varieties to achieve high homozygosity, which further increases the difficulty of identifying cotton varieties using SSR molecular markers. In this study, we used 10× resequencing data of 48 basic cotton germplasm lines to develop SSR markers with singlecopy characteristics and easy-to-read peak shapes, which amplify only a single or two bands in hybrids (characteristics similar to diploid crops), thereby greatly reducing the difficulty of data analysis and statistics.
The key factor in building a DNA fingerprint library of cotton varieties using SSR markers is the screening of a set of excellent SSR primers, which is largely dependent on the screening material. The origin of hetero-source tetraploid cultivated cotton varieties in China is mainly foreign seed, and the current production and breeding varieties are mainly derived from the USA, namely King cotton, Stone-ville®, and Deltapine™ cotton varieties; the genetic basis is relatively narrow, with low genetic polymorphism between varieties (Guo et al. 2014). To obtain polymorphism information and accurate PIC values for the primers, this study took full account of a broad and representative source of genetic materials. We used 48 basic cotton lines from the Yellow River Basin, Yangtze River basin, and the northwest interior as the primary screening material, as well as nearly 400 core cotton germplasm lines representing 7 362 types of upland cotton (Ma et al. 2018) and 12 sets of triplets as re-screening materials. The final set of 60 pairs of excellent SSR primer had 247 single-copy isometric variant types, high PIC values, easy-to-interpret peak shapes, even chromosome distribution.

The specialties and application of perfect SSR markers
The selected 60 pairs of excellent SSR markers are compatible with both conventional polyacrylamide gel electrophoresis and fluorescent capillary electrophoresis detection platforms, and so are suitable for use by a wide range of researchers in different laboratory conditions. Compared with polyacrylamide gel electrophoresis detection, fluorescent capillary electrophoresis has the following advantages: (1) higher resolution, as fine as 1 bp; (2) electrophoresis results are displayed in a peak plot corresponding to the size of the amplified fragments, which can be directly collected to achieve digitization of the results in a form, that is easy to record, analyze and save; (3) the application of high-throughput, multi-PCR and 10-fold capillary electrophoresis detection technology has improved the detection efficiency by 33-fold. The 60 selected primers were used to establish a DNA fingerprinting library of 96 samples, which reduced the number of 96-well PCR plates needed for amplification from the original 60 to 18. Similarly, the number of 96-well electrophoresis plates required was reduced from the original 60 to 10, which greatly reduces the workload and testing costs; (4) the data are easy to Fig. 2 Capillary electrophoresis results for cotton variety Zhongmiansuo 49. In total, 60 SSR primers were divided into six groups. Each primer was labeled with a fluorophore as listed in Table 2; the primer numbers for each group are listed in Table 3 integrate and are suitable for large-scale construction of a DNA fingerprint library.
We built a DNA fingerprint database of 210 samples from the Yellow River Basin, Yangtze River Basin and northwest inland cotton-growing regions using 60 markers. The results of genetic diversity analysis showed that the varieties in the same cotton region were mostly gathered together in the same group, but some varieties of the northwest inland region were scattered in group I, which mainly comprised varieties from the Yellow River basin. This might be because the ecological conditions of some parts of southern Xinjiang are similar with those of the Yellow River basin, or the existence of introduction exchange, which is similar to the results of Guo Zhijun's study (Guo et al. 2014). In terms of identification capacity, using 60 pairs of excellent SSR primers enabled this method to distinguish 210 cotton national trial samples with a 100% success rate.