Unraveling the puzzle of the origin and evolution of cotton A-genome

Gossypium hirsutum, the most widely planted cotton species, its evolution has long been an unsolved puzzle because of its hybrid origin from D-genome and A-genome species. To better understand the genetic component of cotton, Huang et al. recently sequenced and assembled the first A1-genome G. herbaceum, and updated the A2genome G. arboreum and (AD)1-genome G. hirsutum. On the basis of the three reference genomes, they resolved existing controversial concepts and provided novel evolutionary insights surrounding the A-genome.

First A 1 genome sequence and high-quality genome updates of A 2 and (AD) 1 Since the release of the first D 5 -genome in 2012 (Wang et al. 2012), the genome of the second D 5 (Paterson et al. 2012;Udall et al. 2019), and that of A 2 (Du et al. 2018;Li et al. 2014), (AD) 1 (Hu et al. 2019;Li et al. 2015;Wang et al. 2019;Yang et al. 2019;Zhang et al. 2015) as well as (AD) 2 (a much less cultivated G. barbadense) (Hu et al. 2019;Liu et al. 2015;Wang et al. 2019;Yuan et al. 2015), were sequenced and assembled with great updating efforts resulted in much improvements for these genomes. Recently, Huang et al. sequenced and assembled A 1 variety africanum for the first time and reassembled high-quality A 2 cultivar Shixiya1 and (AD) 1 genetic standard Texas Marker-1 (TM-1) genomes on the basis of PacBio long reads, paired-end sequencing and high-throughput chromosome conformation capture (Hi-C) technologies. The resulting A 1 assembly captured 1 556 megabases (Mb) of genome sequences and 95.69% of total genome sequences spanning 1 489 Mb were categorized and ordered into 13 pseudo-chromosomes. They substantially improved the existing A 2 and (AD) 1 genome assemblies, mainly from aspects of genome completeness and accuracy (Huang et al. 2020). These updated A 2 -and (AD) 1 -genomes will certainly replenish earlier assemblies as chromosome-scale references. With high-quality assembly of the three cotton species, a more complete landscape of genome architecture, gene annotations and transposable element (TE) insertions were provided, which is critical to evolutionary and comparative genomics as well as genetic variation analysis.

Origin of allotetraploid cotton and two diploid Agenomes
On the basis of multiple evidences, such as molecular tree, whole-genome phylogenetic relationships and population analysis, the authors propose that all existing A-genomes may have originated from a common ancestor, referred to here as A 0 (Huang et al. 2020). The ancient and extinct A 0 lineage is likely the ancestral form of one of the original species parents of (AD) 1 rather than the extant A 1 and A 2 lineages. Relatively, A 0 was more phylogenetically related to the present A 1 than A 2 . Hybridization of this A 0 genome with a D 5 species eventually produced the current allotetraploid cotton at~1.6 million years ago (MYA), which preceded the speciation of current A 1 and A 2 at~0.7 MYA. This genome-based analysis will likely close down the A 1 versus A 2 argument, especially if archaeological data is digged up someday to pinpoint out the existence of the currentlyextinct A 0 genome. Since the publication of these new findings, our knowledge of cotton genome evolution will have to be revised and updated.

Developing novel methods to detect genome expansion and evolution
The authors suggest that cotton genome evolution is characterized by bursts of transposable element (TE) activity. The two A-genomes and the A-subgenome of (AD) 1 experienced expansion in genome size that was highly correlated with TE bursts. As much as 72.57% of the A 1genome and 73.62% of the A 2 -genome were composed of LTR-type TEs. By using fragmented coding sequences of LTR-type TEs, they developed a novel method, named Gaussian probability density function (GPDF) analysis, to overcome a major pitfall related to traditional method that relied on the presence of both ends of full-length LTRs, such that more recently inserted LTRs are likely overrepresented. They suggested that several long-terminalrepeat bursts which occurred from 5.7 MYA to less than 0.61 MYA contributed compellingly to A-genome size expansion, speciation and evolution.

Characterization of structural variations related to cotton fiber development
The authors characterized structural variations (SVs) in cotton genomes using comparative genomic analysis, which provided dozens of putative candidate genes to investigate phenotypic differences among the three cotton species. Abundant species-specific SVs in genic regions changed the expression of many important genes, some of which were found to alter expression intensities of genes involved in fatty acid biosynthesis, leading probably to spinnable fiber improvement in (AD) 1 compared with A 1 or A 2 , as confirmed by several transgenic cotton lines that overexpressed KCS6 gene.
In conclusion, Huang and colleagues assembled three high-quality cotton reference genomes, and performed systematic and comprehensive genome analysis, especially for cotton A-genomes, which represents a major step toward understanding the evolution of the cotton A genome. They not only have provided the scientific community with valuable genomic and genetic resources to facilitate genetic evolution, comparative genomic analysis, but also will accelerate the process of cotton genetic improvement with advanced methodology and new varieties.