Skip to main content

Unraveling the puzzle of the origin and evolution of cotton A-genome


Gossypium hirsutum, the most widely planted cotton species, its evolution has long been an unsolved puzzle because of its hybrid origin from D-genome and A-genome species. To better understand the genetic component of cotton, Huang et al. recently sequenced and assembled the first A1-genome G. herbaceum, and updated the A2-genome G. arboreum and (AD)1-genome G. hirsutum. On the basis of the three reference genomes, they resolved existing controversial concepts and provided novel evolutionary insights surrounding the A-genome.

Main text

Cotton is one of the most important economic crops in the world, and more than 90% of the cotton fiber is derived from the (AD)1-genome Upland cotton (G. hirsutum) (Ma et al. 2018). The (AD)1-genome is formed by the natural hybridization between D- and A-genome (Li et al. 2015). G. herbaceum (A1-genome) and G. arboreum (A2-genome) are the only two extant diploid A genomes. Abundant studies supported that the D5-genome G. raimondii could be regarded as the D-subgenome donor of (AD)1-genome (Wendel 1989). However, as commented in Science (Zahn 2012), the A-subgenome donor in tetraploid genome has been an unsolved mystery and attracts a great research interest. Previously, either A1- or A2-genome has been reported to be the actual A-genome donor of tetraploid cotton.

First A1 genome sequence and high-quality genome updates of A2 and (AD)1

Since the release of the first D5-genome in 2012 (Wang et al. 2012), the genome of the second D5 (Paterson et al. 2012; Udall et al. 2019), and that of A2 (Du et al. 2018; Li et al. 2014), (AD)1 (Hu et al. 2019; Li et al. 2015; Wang et al. 2019; Yang et al. 2019; Zhang et al. 2015) as well as (AD)2 (a much less cultivated G. barbadense) (Hu et al. 2019; Liu et al. 2015; Wang et al. 2019; Yuan et al. 2015), were sequenced and assembled with great updating efforts resulted in much improvements for these genomes. Recently, Huang et al. sequenced and assembled A1 variety africanum for the first time and re-assembled high-quality A2 cultivar Shixiya1 and (AD)1 genetic standard Texas Marker-1 (TM-1) genomes on the basis of PacBio long reads, paired-end sequencing and high-throughput chromosome conformation capture (Hi-C) technologies. The resulting A1 assembly captured 1 556 megabases (Mb) of genome sequences and 95.69% of total genome sequences spanning 1 489 Mb were categorized and ordered into 13 pseudo-chromosomes. They substantially improved the existing A2 and (AD)1 genome assemblies, mainly from aspects of genome completeness and accuracy (Huang et al. 2020). These updated A2- and (AD)1-genomes will certainly replenish earlier assemblies as chromosome-scale references. With high-quality assembly of the three cotton species, a more complete landscape of genome architecture, gene annotations and transposable element (TE) insertions were provided, which is critical to evolutionary and comparative genomics as well as genetic variation analysis.

Origin of allotetraploid cotton and two diploid A-genomes

On the basis of multiple evidences, such as molecular tree, whole-genome phylogenetic relationships and population analysis, the authors propose that all existing A-genomes may have originated from a common ancestor, referred to here as A0 (Huang et al. 2020). The ancient and extinct A0 lineage is likely the ancestral form of one of the original species parents of (AD)1 rather than the extant A1 and A2 lineages. Relatively, A0 was more phylogenetically related to the present A1 than A2. Hybridization of this A0 genome with a D5 species eventually produced the current allotetraploid cotton at ~ 1.6 million years ago (MYA), which preceded the speciation of current A1 and A2 at ~ 0.7 MYA. This genome-based analysis will likely close down the A1 versus A2 argument, especially if archaeological data is digged up someday to pinpoint out the existence of the currently-extinct A0 genome. Since the publication of these new findings, our knowledge of cotton genome evolution will have to be revised and updated.

Developing novel methods to detect genome expansion and evolution

The authors suggest that cotton genome evolution is characterized by bursts of transposable element (TE) activity. The two A-genomes and the A-subgenome of (AD)1 experienced expansion in genome size that was highly correlated with TE bursts. As much as 72.57% of the A1-genome and 73.62% of the A2-genome were composed of LTR-type TEs. By using fragmented coding sequences of LTR-type TEs, they developed a novel method, named Gaussian probability density function (GPDF) analysis, to overcome a major pitfall related to traditional method that relied on the presence of both ends of full-length LTRs, such that more recently inserted LTRs are likely over-represented. They suggested that several long-terminal-repeat bursts which occurred from 5.7 MYA to less than 0.61 MYA contributed compellingly to A-genome size expansion, speciation and evolution.

Characterization of structural variations related to cotton fiber development

The authors characterized structural variations (SVs) in cotton genomes using comparative genomic analysis, which provided dozens of putative candidate genes to investigate phenotypic differences among the three cotton species. Abundant species-specific SVs in genic regions changed the expression of many important genes, some of which were found to alter expression intensities of genes involved in fatty acid biosynthesis, leading probably to spinnable fiber improvement in (AD)1 compared with A1 or A2, as confirmed by several transgenic cotton lines that overexpressed KCS6 gene.

In conclusion, Huang and colleagues assembled three high-quality cotton reference genomes, and performed systematic and comprehensive genome analysis, especially for cotton A-genomes, which represents a major step toward understanding the evolution of the cotton A genome. They not only have provided the scientific community with valuable genomic and genetic resources to facilitate genetic evolution, comparative genomic analysis, but also will accelerate the process of cotton genetic improvement with advanced methodology and new varieties.

Availability of data and materials

No other data related to this study is available at this time.


Download references


We thank editorial board for inviting us to write comments on this interesting work and thank them for their suggestions in the manuscript revision.


No Funding.

Author information

Authors and Affiliations



Ma ZY wrote the manuscript. The author(s) read and approved the final manuscript.

Corresponding author

Correspondence to MA Zhiying.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The author declares no competing interests.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

MA, Z. Unraveling the puzzle of the origin and evolution of cotton A-genome. J Cotton Res 3, 17 (2020).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: