Skip to main content

Role of SNPs in determining QTLs for major traits in cotton

Abstract

A single nucleotide polymorphism is the simplest form of genetic variation among individuals and can induce minor changes in phenotypic, physiological and biochemical characteristics. This polymorphism induces various mutations that alter the sequence of a gene which can lead to observed changes in amino acids. Several assays have been developed for identification and validation of these markers. Each method has its own advantages and disadvantages but genotyping by sequencing is the most common and most widely used assay. These markers are also associated with several desirable traits like yield, fibre quality, boll size and genes respond to biotic and abiotic stresses in cotton. Changes in yield related traits are of interest to plant breeders. Numerous quantitative trait loci with novel functions have been identified in cotton by using these markers. This information can be used for crop improvement through molecular breeding approaches. In this review, we discuss the identification of these markers and their effects on gene function of economically important traits in cotton.

Background

Plant breeders are interested in genetic variations because these variations are the basis of phenotypic diversity. Many traits in plants arose due to genetic variations caused by mutation and/or recombination; those traits that were useful were ‘fixed’ by natural as well as artificial selection. With advances in technology, various methods have been developed by scientists to detect and analyze the minor genetic variations whose effects cannot be seen in the phenotypes (Jang et al. 2015). A base pair is the smallest unit of inheritance in an individual and when two or more individuals differ from each other based on a nucleotide then it is called a single nucleotide polymorphism (SNP). The identification of these minor variations was the initial challenging to plant scientists. The advent of next generation DNA sequencing technologies has solved this puzzle by being able to detect new functional SNPs associated with diverse traits. This whole genome sequence data serves as a reference for the identification of polymorphism due to SNPs among the individuals of the same species (Xie et al. 2010). A lot of re-sequenced data is also available to identify the sequence diversity within crop plants. This data revealed whether changes in the genome within a species arose due to one or multiple factors (DePristo et al. 2011). Indeed the function of several genes has also been modified due to changes in a nucleotide which led to differences at the phenotypic level within plants of a species (Chung et al. 2013; Shi et al. 2015). Plant scientists have also reported several functional SNPs associated with phenotypic changes in various accessions of crop plants (Jang et al. 2015; Arruda et al. 2016). Several assays have been reported for genotyping in plants and most of these assays depend upon various molecular markers (Lateef 2015). SNP markers are the most abundant and robust ones for high throughput genotyping of plants. These markers can be found in all regions of a genome and a single gene may contain multiple SNPs (Rafalski 2002; Alkan et al. 2011). They play a significant role in determining phenotypic differences in plants, animals, humans and microbes (Moen et al. 2008; De Souza et al. 2010).

Identification of the location of a particular gene, measurement of distance among genes and their arrangement on the chromosome is called genetic mapping (Semagn et al. 2006). Genetic maps play an important role for the identification of quantitative trait loci (QTLs) (Ganal et al. 2009; Poland et al. 2012). The co-dominant, abundant and cost-effective nature of identifying SNPs made them ideal for construction of genetic maps in plant species. Genetic maps based on SNPs have been developed in several crop species such as cotton (Byers et al. 2012), rice (Xie et al. 2010), maize (Buckler et al. 2009), soybean (Akond et al. 2013) and Brassica (Li et al. 2009). Likewise, genome wide association study (GWAS) using SNP markers is a useful tool to develop genome wide haplotypes (Yano et al. 2016) and to detect natural diversity in cotton (Huang et al. 2017) and other crops (Aranzana et al. 2005; Yu and Buckler 2006; Poland and Rife 2012; Pasam et al. 2012). Identifying patterns among SNPs is a good method to study the evolution of a species at the genomic level to understand the history of a population as well as genetic variation among individuals and the role of selection pressure in inducing variation (Morin et al. 2004). SNPs also provide information about evolution of the modern genome by comparing the sequences of various species (Lu et al. 2013). Phylogenetic analysis of diploid cotton species using SNP markers revealed that A1 and A2 genomes are 98% similar (Shaheen et al. 2016).

Detection of SNPs in plants

Several techniques have been reported for the detection of SNPs in crop plants. Genotyping by sequencing (GBS) has been widely used for the identification of SNPs because of its low cost, rare chances of error and lower DNA purification requirement (Davey et al. 2011). The first step to identify SNPs from GBS is the isolation of genomic DNA. After quantification, the DNA is digested with a restriction enzyme. The choice of restriction enzyme is very important. Two restriction enzymes can be used for double digestion. Methylation sensitive restriction enzymes can also be used for analysis of methylated DNA. Digested DNA is then ligated with adaptors tagged by specific end sequences for polymerase chain reaction (PCR) amplification and sequencing. Various bioinformatic analyses are carried out on sequencing data in order to identify SNPs. These SNPs are further experimentally verified for their functional annotation (Elshire et al. 2011). A disadvantage of GBS is that some important regions of the genome may be missing from genomic libraries because the selected restriction enzymes did not cut in those regions. Another drawback of GBS is potential errors during sequencing (Kim et al. 2016).

The restriction-site associated DNA sequencing (RAD-seq) technique is used for discovery of SNPs when a reference genome is not available (Andrews et al. 2016). With this technique, a P1 barcoded adapter is ligated to short DNA fragments generated after DNA digestion with restriction enzymes. Adapter-ligated fragments of different samples are combined and DNA is sheared. Then, P2 adapter primers are ligated to the DNA for amplification of these fragments and to produce sequencing libraries (Bergey et al. 2013). This technique is independent of a reference genome and relatively inexpensive. The degree of genome coverage can also be adjusted (Reitzel et al. 2013). This method requires high quality DNA and loss of sheared restriction sites may occur due to sequence polymorphism (Suchan et al. 2016). Another technique developed for large scale SNP based genotyping is specific locus amplified fragment sequencing (SLAF-seq). In this method, DNA sample is first digested with MseI and then digested with AluI. The resulting fragments are amplified by PCR, adapters are added and fragments are purified to obtain sequence libraries (Sun et al. 2013). This low cost method is useful for sequence based genotyping of large populations but it does not cover the whole genome (Ma et al. 2015). Chromatin immunoprecipitation followed by sequencing (ChIP-seq) is a sequencing tool that is used to investigate gene expression, i.e., transcription factors (Johnson et al. 2007). This tool has been characterized as robust because it profiles protein-DNA interaction in vivo on a genome-wide scale. It has enabled breakthroughs in transcriptional regulatory networks in Saccharomyces cerevisiae and human DNA regulatory sequences (Song et al. 2016). This protocol has great potential but is challenging to perform in plants due to necessary vigorous disruption of cell walls, presence of phenolic compounds and polysaccharides, and limited selection of quality antibodies that give a strong signal.

Reporting of SNPs/QTLs in cotton

Fibre quality and yield traits

Cotton is an important fibre and oilseed crop in tropical, sub-tropical and temperate regions of the world. It is widely grown on an area of 33.4 million hectares with production of 121.4 million bales annually (Johnson et al. 2018). Among 50 species of cotton, the allotetraploid species Gossypium hirsutum (also known as upland cotton) is the most widely grown (Sekmen et al. 2014). Cotton fibres and linters are the ultimate product of this crop that determine its price in an international market (Bradow et al. 1997). Staple length, strength, fineness and uniformity ratio are main parameters which are used to estimate fibre quality. Yield of seed cotton is a complex attribute that depends upon various parameters like boll weight, number of bolls per plant and lint percentage (Tang et al. 1996). Several SNPs and SNP-QTLs have been reported for yield and fibre related traits. Potential SNPs reported in cotton for all traits discussed here are summarized in Table 1. The cotton 63 K SNP array was used to identify 71 QTLs for fibre quality traits strongly linked with SNP markers. These QTLs are comprised of seven pleiotropic QTL clusters, 19 e-QTLs, five hotspots and nine novel QTLs (Li et al. 2016). The linkage mapping, chromosomal localization and phylogenomic characterization of six MYB genes were carried out in four tetraploid cotton species via SNP markers. These MYB genes are actively involved in fibre development. The amplicon cloning and sequencing method of genotyping was used to detect 108 SNPs for these genes. It was determined that all six MYB genes evolved independently and exhibited significant variation in the D genome as compared with the A genome (An et al. 2008). Keerio and colleagues used 107 introgression lines derived from an interspecific cross of G. hirsutum and G. tomentosum for QTL mapping. They used the SLAF-seq method to obtain SNP markers. In this study, 74 QTLs and five clusters were found that were related to various fibre quality parameters (Keerio et al. 2018). Islam and co-workers have detected and validated 5 617 SNPs in upland cotton using GBS (Islam et al. 2015). These researchers have also reported 6 071 SNPs and 86 QTLs for the GhRBB1_A07 gene. The experiment revealed the potential role of this gene in determining quality of cotton fibres. To identify this gene, they used a multi-parent advanced generation inter-cross (MAGIC) population which was developed through random mating of diverse G. hirsutum parents (Islam et al. 2016a).

Table 1 Characterization of reported SNPs/QTLs in cotton for various traits of economic interest, stresses and plant architecture

More recently, 110 QTLs and five key genes namely Gh_D12G0410, Gh_D12G0969, Gh_D12G0093, Gh_D12G0435 and Gh_D03G0889 were found to be involved in fibre development in intraspecific crosses of G. hirsutum. These QTLs were detected though the GBS approach (Diouf et al. 2018). Another research group detected 28 QTLs related to fibre quality and agronomic parameters in a recombinant inbred mapping population using the GBS approach. They found seven QTLs for fibre strength while one QTL was detected for lint yield (Gore et al. 2014). Liu et al. used 231 recombinant inbred lines (RILs) and the Cotton SNP 80 K array to identify 122 QTLs for yield related traits and 134 QTLs for fibre quality parameters. Of these QTLs, 57 were detected in multiple environments and, therefore, were named as stable QTLs. The same group has also found 348 quantitative trait nucleotides (QTNs) with 74 stable QTNs for yield and fibre related traits (2018). The research group of Su has recognized 12 SNPs and 2 highly stable QTLs for lint percentage through a GWAS of 355 accessions. They used the SLAF-seq method for genotyping these cotton lines. These SNPs could provide a source to improve lint yield though molecular breeding (Su et al. 2016a). In another study, researchers have discovered 37 QTLs on chromosome 25 in a RIL population of upland cotton using the SLAF-seq method. These QTLs were related to various fibre quality attributes (Zhang et al. 2015). In a separate report, Zhang found 63 QTLs for fibre strength, and these QTL were highly stable in nature. The researchers have used the Cotton SNP 63 K array for genotyping. This chip contains SNPs from several cotton species including G. hirsutum, G. barbadense, G. tometosum, G. mustelinum, G. armourianum and G. longicalyx (Hulse-Kemp et al. 2015; Zhang et al. 2017). SNPs were also used to construct a genetic linkage map through the SLAF-seq approach and identify QTLs for boll weight. One hundred forty-six QTLs were found in 11 environments, and 16 of these QTLs were classified as stable QTLs because they were detected in more than three environments (Zhang et al. 2016b). Resequencing of 419 upland cotton accessions lead to the discovery of 3 665 030 SNPs. These accessions were phenotyped for 13 fibre related traits in 12 different environments. GWAS revealed the association of 7 383 unique SNPs and 4 820 candidate genes for these traits (Ma et al. 2018).

Biotic and abiotic stress tolerance

The cotton plant faces various stresses during its life cycle that limit the productivity of the crop around the world. A single base pair difference between genotypes may be the underlying reason for a differential response to environmental stresses. Many studies have been conducted to evaluate whether genomic information can be used to identify SNPs and QTLs related to biotic and abiotic stress tolerance. The GBS method has been exploited to construct a high density genetic map with 10 888 SNPs from segregating populations of an interspecific cross (G. hirsutum × G. tomentosum) to detect QTLs related to drought tolerance. Thirty-four thousand four hundred two (34 402) and 32 032 genes were also mined within the Dt and At sub-genomes, respectively, to understand the genetics of drought tolerance (Magwanga et al. 2018). Abdelraheem et al. mapped QTLs for drought and salt tolerance using an RIL population derived from a cross of two diverse parental lines. A total of 165 QTLs were discovered though the GBS approach in this study, with 15 QTLs associated with tolerance to salinity and drought stresses common to two environments, i.e., greenhouse and field conditions (2018). Likewise, a high-density linkage map was also constructed using a segregating population of an intra-specific cross between salt tolerant and salt susceptible genotypes. A total of 66 QTLs and 5 178 SNP markers were identified thorough GBS for 10 salinity tolerance related traits in three different environments. Out of these QTLs, 14 were designated as stable due to their presence in more than one environment. Nine and five stable QTLs were located in the Dt and At sub-genomes, respectively, and 12 key genes were found to be involved in conferring salinity resistance at the seedling stage (Diouf et al. 2017). In another experiment, Wang et al. used salt tolerant and susceptible genotypes for mining SNPs using the Cotton 63 K SNP array. A total of 7 087 SNPs were mined, out of which 1 282 were highly related to salinity tolerance in cotton (2016). Beside salinity and drought, another major abiotic stress is high temperature, but the SNPs related to this stress are yet to be explored in cotton. Previously, 21 SNPs were reported for the mitochondrial small heat shock protein gene (MT-sHSP). These SNPs were identified through PCR amplification and sequencing of this gene derived from several cotton species (Shaheen et al. 2009).

Among biotic stresses, Verticillium wilt is one of the major threats to cotton production in the USA, China and Turkey (Baytar et al. 2017). This disease causes significant reduction in yield, and the pathogen can survive for several years in the soil (Zhang et al. 2016a). GWAS revealed 17 SNPs related to Verticillium wilt resistance through the SLAF-seq method of genotyping. These SNPs were stable in three different environments. QTL analysis also revealed that CG02 (a disease resistance protein belonging to the TIR-NBS-LRR class) seems to be responsible for resistance to Verticillium dahlia (Li et al. 2017b). Likewise, Zhao et al. used the Cotton SNP 63 K array to detect SNPs and QTLs related to this disease in two different environments. The results revealed the presence of 21 171 SNPs across 120 accessions of G. hirsutum. Three clustered QTLs, two major QTLs, 12 functional genes and six mRNAs conferring resistance against Verticillium were also detected (2017). In another research report, genomic analysis of many accessions through GBS revealed three trait loci involved in Verticillium wilt resistance. A candidate gene (Gh_D06G0687) was also reported that conferred resistance to this pathogen by encoding an NB-ARC domain (Fang et al. 2017). Cotton blue disease is one of the major diseases of cotton in Brazil, and it is transmitted through aphids (Silva et al. 2008). Haplotype mapping of a large segregating population through amplicon cloning and sequencing using specific SSR primers revealed that resistance was conferred by four SNPs (Fang et al. 2010). Another four SNP markers were discovered through haplotype mapping that were highly associated with resistance to bacterial blight disease (Xanthomonas axonopodis pv. Malvacearum) (Xiao et al. 2010). Aside from these diseases, the productivity of cotton is also affected by cotton leaf curl virus, root rot and cotton mosaic virus. Moreover, a huge number of pest insects are associated with this crop, but no SNPs linked to these biotic stresses have been reported in the literature to our knowledge. Therefore, it is important for molecular plant breeders to explore SNPs related to these biological threats in order to understand the basis of genetic resistance.

Earliness

Early maturity is an important feature which is essential if growing more than one crop per year or to escape from late season environmental stresses. An early maturing genotype also requires less irrigation as well as less fertilizer and chemical inputs (Bednarz and Nichols 2005; Cober et al. 2010; Akter et al. 2019). One study was conducted to detect SNPs related to early maturity in upland cotton using 137 RILs. Sequence based genotyping revealed that 6 295 SNPs and 247 QTLs were associated with six morphological traits related to earliness. These QTLs were deemed highly stable due to their identification in six consecutive years, i.e., 2010 to 2015 (Jia et al. 2016). In another project, the SLAF-seq genotyping strategy was used to identify SNPs related to six earliness linked traits from 355 G. hirsutum accessions grown in four different environments. A total of 81 675 SNPs and 11 highly favorable SNP alleles were discovered. GWAS also revealed a potential candidate gene (CotAD_01947) that was associated with early maturity (Su et al. 2016c). More recently, a GWAS was conducted to identify SNPs and genes associated with four earliness related traits. A total of 49 650 SNPs were discovered using the cotton SNP 80 K array, and 29 SNPs were highly associated with early maturity. In addition, two potential candidate genes (Gh_D01G0340 and Gh_D01G0341) were also related to earliness (Li et al. 2018b). Likewise, the GBS method has been used to construct a high-density genetic linkage map to discover QTLs related to this trait. The linkage map was comprised of 3 978 SNPs, and 47 QTLs were detected. These QTLs were associated with six earliness qualities. A study of an early maturing cultivar revealed two highly expressed potential candidate genes (i.e., Gh_D03G0885 and Gh_D03G0922) (Li et al. 2017a).

Plant architecture and other important traits

A combination of traits are desirable to increase productivity of the cotton crop. Plant architecture is an important factor that determines suitability of cotton genotypes for mechanical picking and as well as to improve yield (Song and Zhang 2009). This complex multigenic trait has been given less importance in cotton as comparing with wheat and rice where deployment of dwarfing genes led to the Green Revolution. To investigate the genetic basis of plant architecture, a GWAS experiment was conducted with 121 upland cotton genotypes. The researchers identified 2 620 639 SNPs, 11 QTLs and 5 candidate genes for two plant architecture traits, i.e., fruit spur branch number and plant height. The cotton accessions were genotyped with the whole genome resequencing approach and phenotyped in multiple environments (Wen et al. 2019). In another study, 93 250 SNPs for five plant architecture traits were found in 355 Chinese upland cotton accessions using the SLAF-Seq method. GWAS revealed 22 highly associated SNPs and 21 candidate genes for these traits (Su et al. 2018). Molecular analysis of the short fruiting branch gene was carried out in an F2 population between two parents, one with short fruiting branches and the other with long fruiting branches. One SNP locus (SNP_GH1570) was found to be highly associated with short fruiting branches when using derived cleaved amplified polymorphic sequences (dCAPS). It was concluded that this SNP maker was useful for selection of cotton plants with short fruiting branches (Zhang et al. 2018a). A separate study revealed the presence of 17 QTLs associated with plant height, height of fruiting branch node and number of vegetative shoots. These QTLs were located on nine different chromosomes and were detected through the GBS method (Qi et al. 2017).

A nulliplex-branch mutant was developed to explore the position of flowers on the cotton plant. This mutant line exhibits flowers which arise directly from leaf axils on the main stem, without a fruiting branch, i.e., monopodial and sympodial branches. This trait is desirable so planting densities can be increased without using chemicals to regulate plant growth (Du et al. 1996). To discover the molecular basis of the nulliplex-branch mutant, a genetic map was constructed from a G. hirsutum by G. barbadense interspecific population. The map was comprised of 11 805 SNP markers which were identified through next generation sequencing. The analysis revealed that 42 SNPs were associated with gb_nb1, a recessive gene that controls the nulliplex-branch trait (Chen et al. 2015). Virescent leaves in cotton are characterized by their yellowish appearance at early stages of plant growth. This abnormality is due to a recessive gene, v1. Sequence analysis of wild and mutant alleles showed the differences in four SNPs at sequence positions 426, 450, 709 and 1 082. It was further revealed that the SNP at position 1 082 caused a point mutation that resulted in synthesis of arginine instead of lysine in mutant polypeptides (Zhang et al. 2018b). In another study, genetic diversity for leaf transcriptomes was identified in G. barbadense. Through a cDNA library sequencing technique, researchers have found more than 10 000 SNPs associated with various traits in three Egyptian cotton cultivars (Kottapalli et al. 2016). Likewise, many SNP markers were also identified using the GBS approach. These SNPs were considered as a source of variation for various agronomic and biochemical traits in cotton (Logan-Young et al. 2015).

Conclusions

The study of SNPs opens new horizons for plant biotechnologists to improve various features of a crop plant; a single SNP has the potential to alter the expression of a gene by inducing changes in its amino acid sequence. SNPs identified in coding regions of genes have gained more attention from molecular plant breeders as comparing with those found in non-coding regions. Various assays have been exploited using these markers to detect genetic variability in the genomes of field crops. Plant researchers have utilized these markers successfully in cotton and other crops for improvement and development of tolerance to biotic and abiotic stresses, fibre quality and yield in order to enhance profitability for farmers.

Abbreviations

dCAPS:

Derived cleaved amplified polymorphic sequences

GBS:

Genotyping by sequencing

GWAS:

Genome wide association study

NGS:

Next generation sequencing

PCR:

Polymerase chain reaction

QTL:

Quantitative trait loci

QTN:

Quantitative trait nucleotides

RAD-seq:

Restriction-site associated DNA sequencing

RILs:

Recombinant inbred lines

SLAF-seq:

Specific locus amplified fragment sequencing

SNP:

Single nucleotide polymorphism

References

Download references

Acknowledgements

The authors are highly grateful to reviewers for critical review and also thankful to all of collaborators for giving productive contribution for preparing this review article.

Funding

Not applicable.

Availability of data and materials

Not applicable.

Author information

Authors and Affiliations

Authors

Contributions

Majeed S and Azhar MT has collected the literature and wrote this draft, Rana IA, Atif RM, Ali Z and Hinze L have reviewed and edited this article for publication. All authors read and approved the final manuscript.

Corresponding author

Correspondence to AZHAR Muhammad Tehseen.

Ethics declarations

Authors’ information

Not applicable.

Ethics approval and consent to participate

Not applicable.

Consent for publication

All the authors and co-authors are agreed to submit the review article in BMC Journal of Cotton Research.

Competing interests

The authors declare that they have no competing interests.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

MAJEED, S., RANA, I.A., ATIF, R.M. et al. Role of SNPs in determining QTLs for major traits in cotton. J Cotton Res 2, 5 (2019). https://doi.org/10.1186/s42397-019-0022-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s42397-019-0022-5

Keywords