Skip to main content

Evolution of pectin synthesis relevant galacturonosyltransferase gene family and its expression during cotton fiber development



Pectin is a key substance involved in cell wall development, and the galacturonosyltransferases (GAUTs) gene family is a critical participant in the pectin synthesis pathway. Systematic and comprehensive research on GAUTs has not been performed in cotton. Analysis of the evolution and expression patterns of the GAUT gene family in different cotton species is needed to increase knowledge of the function of pectin in cotton fiber development.


In this study, we have identified 131 GAUT genes in the genomes of four Gossypium species (G. raimondii, G. barbadense, G. hirsutum, and G. arboreum), and classified them as GAUT-A, GAUT-B and GAUT-C, which coding probable galacturonosyltransferases. Among them, the GAUT genes encode proteins GAUT1 to GAUT15. All GAUT proteins except for GAUT7 contain a conserved glycosyl transferase family 8 domain (H-DN-A-SVV-S-V-H-T-F). The conserved sequence of GAUT7 is PLN (phospholamban) 02769 domain. According to cis-elemet analysis, GAUT genes transcript levels may be regulated by hormones such as JA, GA, SA, ABA, Me-JA, and IAA. The evolution and transcription patterns of the GAUT gene family in different cotton species and the transcript levels in upland cotton lines with different fiber strength were analyzed. Peak transcript level of GhGAUT genes have been observed before 15 DPA. In the six materials with high fiber strength, the transcription of GhGAUT genes were concentrated from 10 to 15 DPA; while the highest transcript levels in low fiber strength materials were detected between 5 and 10 DPA. These results lays the foundation for future research on gene function during cotton fiber development.


The GAUT gene family may affect cotton fiber development, including fiber elongation and fiber thickening. In the low strength fiber lines, GAUTs mainly participate in fiber elongation, whereas their major effect on cotton with high strength fiber is related to both elongation and thickening.


Cotton, the world’s most important economic fiber crop, is tightly connected to the ecomonic development and human livelihoods. Cotton fiber formation and development is an important factor in cotton breeding (Fan 2013). The cell wall structure of cotton fiber is composed of pectin and cellulose. The synthesis and decomposition of pectin are therefore important factors affecting cotton fiber formation (Basra et al. 1984).

Pectin, which plays a critical role in plant growth and development, is mainly found in the primary and middle layers of plant cell walls, and participates in the formation of plant tissue structure, biotic and abiotic stress processes. Pectin biosynthesis is estimated to require at least 67 transferases including glycosyl-, methyl-, and acetyltransferases (Mohnen et al. 2008). Pectin is also the most complex cell wall polysaccharide (Scheller et al. 2007). The sugar nucleotide reverse transporter transports the sugar nucleotide to the Golgi apparatus under the action of a specific glycosyltransferase. The glycosyl group is then cleaved and attached to the elongating polysaccharide chain to form pectin. The synthesis of pectin involves at least 53 different glycosyltransferases localized on the Golgi apparatus (Ridley et al. 2001).

Only two glycosyltransferase-coding genes are recognized on the basis of previous research in Arabidopsis: homogalacturonan (HG) synthase and glucuronyltransferase involved in rhamnogalacturonanII (RG II) synthesis (Willats et al. 2006). Compared with currently cataloged HG pectin multimerism in plants, much researches have been performed on galacturonyltransferase (HG: α-1,4-D-galacturonosyltransferase, HG:GalAT). The gene encoding HG:GalAT (EC is named galacturonosyltransferase (GAUT) (Mohnen et al. 2008;Willats et al. 2006).

Galacturonosyltransferases (GAUTs), which are partly responsible for pectin biosynthesis, are glycosyltransferase (Harholt et al. 2010). According to evolutionary analysis, they constitute glycosyltransferase family 8 (GT8s). GT8 consists of three separate protein classes, classes I and II contain mostly eukaryotic proteins, while almost the entire class III consists of bacterial proteins (Yin et al. 2010a, b). Plant cell-wall-related proteins, including GAUT and GAUT-like (GATL) proteins, are all located in class I (Sterling et al. 2006). The GAUT gene family was first identified by Blast analysis of the Arabidopsis genome, on the basis of structural similarity, 15 GAUT and GATL family members have been classified as follows, GAUT1-GAUT7 in GAUT-A; GAUT8-GAUT11 in GAUT-B; and GAUT12-GAUT15 in GAUT-C. All GATL family members are clustered together and constitute a clade closely related to GAUT15. Experiments have shown that GAUT1 is equivalent to HG:GalAT, GalAT and the GAUT1-related gene family provides the genetic and biochemical tools required to study the function of these genes in pectin synthesis (Sterling et al. 2006). GAUT1 also belongs to GT8, and the GAUT1-related superfamily also contains 10 GAUT-like genes (Cantarel et al. 2008). The gaut1 mutation can cause plant dwarfing, reduce cell adhesion and a 25% reduction in GalA content of leaves (Orfila et al. 2005). The GAUT1-GAUT7 core complex is held together by one or more covalent disulfide bonds and other noncovalent interaction. GAUT1 is dependent on GAUT7 and remains in the Golgi apparatus, where it is considered to be a component of the HG:GalAT complex that participates in pectin synthesis (Atmodjo et al. 2011). The gaut8 mutation can reduce the adhesion of epidermal cells of the young leaves and the marginal cells of the roots, thereby leading to plant dwarfing (Durand et al. 2009). The gaut11 mutation can reduce the thickness of the seed mucus (Caffall et al. 2009). In the gaut13-gaut14 double mutant, the distribution of pectin in the pollen tube wall is altered, which results in serious defects in pollen tube shape and growth (Wang et al. 2013).

The GAUT family is large, and more than 67 members have been found in tomato (Solanum lycopersicum). The GAUT family member with the highest expressed level in tomato is gaut4 (Godoy et al. 2013). In the gaut4 mutant, pectin structure is changed significantly, and other fruit traits, such as starch content, fruit yield, and single fruit quality, are also altered, consistent with an observed increase in firmness. In addition, harvest index is significantly decreased because of a reduction in fruit weight and number (Godoy et al. 2013). Hyodo et al. (2013) demonstrated that the gaut1 gene had higher transcription level during fruit development, especially in the fruit epidermis, while gaut1 showed higher expression level in the mesocarp, endocarp, septum, locular tissue, and nucleus after the fruit enters maturity.

To improve the quality of high fiber strength cotton materials, we have focused on cotton fiber development. Zhang et al. (2017) used recombinant inbred lines (RILs) to identify 16 stable quantitative trait loci (QTLs) related to fiber strength. In addition, Zou et al. (2019) identified 3 364 candidate genes related to cotton fiber strength, and analyzed the differences in fiber development (5 to 30 DPA) using two extremely different materials in a RIL population. A total of 363 differentially expressed genes (DEGs) comprising 4, 75, 39, 62, and 183 genes at 10, 15, 20, 25, and 30 DPA, respectively, were detected. Among them, 228 (62.6%) genes were upregulated in high-strength materials, while Gh_A07G1907 (homologous to GAUT6) was downregulated at 15 DPA but upregulated at 25 DPA.

As discussed aboved, the GAUT gene family has been studied in plants such as Arabidopsis and tomato etc., but no systematic investigation has been carried out in cotton. In the current study, we systematically and comprehensively analyzed the chromosomal location, structures, and phylogenetic relationships of the GAUT gene family in different Gossypium species. We also focused on the transcription of these genes during the fiber development.


Identification of cotton GAUT genes

Detailed phylogenetic analyses have divided GT8 proteins into two distantly related clades: 1) the GAUT1-related family, including the GAUT and GATL proteins, known as galacturonosyltransferase proteins, and 2) a group including plant glycogen protein-like starch starters (PGSIPs) and galactitol synthases (GolSs) (Yin et al. 2010a, b). According to the Pfam database and a bioinformatics analysis, all inferred proteins have a Glyco_transf_8-like domain (PF01501), which indicates that the corresponding genes belong to the GAUT gene family (Kikuchi et al. 2003). To identify the GAUT gene in Gossypium species, we identified 187 GAUT genes from eight species (Fig. S1), including 131 genes from the following species: G. hirsutum (41 genes), G. barbadense (42 genes), G. arboreum (25 genes) and G. raimondii (23 genes). The length of coding regions in the GAUT gene family ranged from 1 098 to 4 899 bp, and the encoded proteins comprised of 365 to 1 632 amino acids residues. The length of 32 GAUT genes was less than 3 000 bp; 88 genes had lengths of 3 000 to 7 000 bp, while the remaining 11 were longer than 7 000 bp (Table 1).

Table 1 Chromosomal locus ID and length of the encoded galacturonosyltransferase proteins in cotton

Phylogenetic analysis and classification of the GAUT gene family in cotton

Using published genome sequence data from eight species, we determined the phylogenetic relationships of GAUT gene family members among multiple species. In a previous study (Sterling et al. 2006), three types of protein sequences, GAUT-A, GAUT-B, and GAUT-C, were found by multiple sequence alignment of the 187 GAUT proteins of these eight species to Arabidopsis homologs (Fig. 1a). In the present study, we analyzed the GAUT genes of four cotton species, thereby classifying 62 proteins into GAUT-A, 38 into GAUT-B, and 31 into GAUT-C (Fig. 1b). In addition, GhA07G1907 was a differentially expressed gene in the QTL which was related to fiber strength. It was a homologous gene to AtGAUT6. And genes homologous to AtGAUT6 had the most number in Gossypium species (Zou et al. 2019). We found 16 AtGAUT13 homologs, 14 genes were homologous to each of AtGAUT7, AtGAUT9, and AtGAUT11, and 11 genes were homologous to each of AtGAUT2 and AtGAUT12. The remaining genes had fewer than 10 homologs. No gene homologous to AtGAUT14 were detected in any of the four cotton species, and no AtGAUT5 homolog was identified in G. hirsutum. Only one homolog of each of AtGAUT5 and AtGAUT10 were detected, namely, GrGAUT18 and GhGAUT03, respectively. Four homologs each of AtGAUT2, 4, 6, 7, 9, 12, and 13, three homologs each of AtGAUT8 and AtGAUT11, and two homologous genes to each of AtGAUT1, and AtGAUT3 were discovered in G. hirsutum.

Fig. 1
figure 1

Phylogenetic tree of the galacturonosyltransferase (GAUT) gene family

a. Neighbor-joining phylogenetic tree of galacturonosyltransferase (GAUT) genes from eight species. b: Neighbor-joining phylogenetic tree of galacturonosyltransferase (GAUT) family form the four cotton species. They comparised three types of protein sequences by multiple sequence alignment of the 187 GAUT proteins for eight species. From the four cotton species, GAUT gene family were classified into GAUT-A (62 proteins), GAUT-B (38 proteins), and GAUT-C (31 proteins)

Analysis of conserved motif and GAUT protein structures

The following motif is conserved in 15 Arabidopsis GAUT protein and their orthologues in cotton: H-DN-A-SVV-S-V-H-T-F (H-x (2)-[ILV]-x-[ST]-D-N-[IV]-[IL]-A-[ASTV]-S-V-V-[AIV]-x-S-x-[AIV]-x (2)-[AS]-x (2)-[PS]-x (3)-V-[FL]-H-[ILV]-[ILV]-T-[DN]-x (2)-[NST]-x (2)-[AGP]-[IM]-x (3)-F) (Sterling et al., 2006). The GAUT gene family encode proteins with a molecular mass between 61 and 78 kDa (Sterling et al. 2006; Godoy et al. 2013). Consistent with topological predictions, most GAUT proteins can encode a type II membrane protein containing a putative transmemberance domain in its hypervariable N-terminal region. Among the GAUT proteins analyzed in our study, three GAUT proteins (GAUT 3, 4, and 5) which belongeding to GAUT-A contained an N-terminal signal peptide rather than a transmemberance domain. The only GAUT gene family members predicted to have no N-terminal transmemberance domain or signal peptide in cotton were GAUT2 proteins (Fig. 2b). We also found some GAUT1, GAUT3 and GAUT11 proteins with the above-mentioned characteristics. Among 14 GAUT7 proteins belonging to GAUT-A group in four Gossypium species (Table 1), the proteins encoded by 10 genes homologous to GAUT7 contained the conserved PLN02769 domain (Fig. 2b), which was assigned to the category of “Probable Galacturonosyltransferase” ( The remaining GAUT family members contained a conserved glycosyl transferase family 8 domain. The predicted motifs of each member are shown in Fig. 2C, and the specific structural information is given in Fig. S2.

Fig. 2
figure 2

The stucture analysis of galacturonosyltransferase (GAUT) gene family. a: Phylogenetic tree of the galacturonosyltransferase (GAUT) gene family. b: Analysis of the genomic sequences of the GAUT gene family domains. c: Analysis of the genomic sequences of the GAUT gene family motifs

Analysis of collinearity and repeating elements in the GAUT gene

According to the results of MCScan analysis, there was no tandem repeat element present in the GAUT gene family in Gossypium. In G. hirsutum, GhGAUT01 and GhGAUT19, GhGAUT15 and GhGAUT34, GhGAUT16 and GhGAUT36, GhGAUT17 and GhGAUT38, and GhGAUT18 and GhGAUT39 were homologous to the AtGAUT3, AtGAUT11, AtGAUT12, and AtGAUT13, respectively, and were also segment repeats. Genes in diploid Gossypium species corresponding to the above repeated genes were shown in Fig. 3. According to their relative order in Gossypium, GAUT genes were categorized into five groups (1 to 5). Only group 4 belongs to GAUT-A, all other groups were members of GAUT-C.

Fig. 3
figure 3

Collinearity of galacturonosyltransferase genes in G. arboretum, G. hirsutum and G. raimondii

Analysis of GAUTs transcription patterns

As determined by collinearity and repetitive element analyses of the above homologous genes in combination with transcriptome data from different fiber developmental stages of diploid Gossypium species, GaGAUT02 had the highest transcript level at 15 DPA but almost no transcription during other periods. Other members of group 1 (Fig. 3) were also barely transcribed in G. raimondii and tetraploid cotton, with fragments per kilobase of transcript per million fragments mapped (FPKM) values of less than 2.0. In group 2, GhGAUT15 and GhGAUT34 had the highest transcript level during the late stage (20 to 25 DPA) of fiber development (Fig. 4). Group 3 members GhGAUT16, GhGAUT36, GrGAUT14, and GaGAUT22 were not transcribed at all during the fiber developmental period. Among group 4 genes, GhGAUT17 and GhGAUT38 had their highest transcript level during fiber developmental from 5 to 10 DPA, and GrGAUT22 and GaGAUT24 had the peak transcript level at 0 and 15 DPA, respectively. All members of group 5 except for GaGAUT25 were transcribed at 15 DPA, and the remaining genes had almost no transcription during fiber development. In four cotton species, six genes, namely GaGAUT08, GaGAUT12, GaGAUT13, GrGAUT03, GrGAUT18, and GhGAUT25, had peak FPKM values greater than 40. The expression peak of these six genes all occurred before 15 DPA, which suggested that the GAUT gene family play a important role in early cotton fiber development.

Fig. 4
figure 4

Heatmap  of the galacturonosyltransferase gene transcript levels based on transcriptome analysis. a: galacturonosyltransferase gene transcript levels during fiber development in G. arboreum (0 to 15 DPA). b: Galacturonosyltransferase gene transcript levels during fiber development in G. raimondii (0 to 15 DPA). c: Galacturonosyltransferase gene transcript levels during fiber development in G. arboreum (10 to 28 DPA). d: Galacturonosyltransferase gene transcript levels in the root, stem, leaf, and fiber tissues (5 to 30 DPA) from G. hirsutum

We selected a RIL population containing high-strength fiber and low-strength fiber lines for quantitative real time polymerase chain reaction (qRT-PCR) analysis (Wang et al. 2014). For this analysis, we selected GhGAUT08 (Gh_A07G1907) (Zou et al. 2019) and GhGAUT25 which belonged to GAUT-A, and GhGAUT10, GhGAUT11 and GhGAUT29 which belonged to GAUT-B. And all had FPKM values greater than 10 (Tables 2,3,4, and 5). The qRT-PCR analysis revealed that the GAUT genes had an important influence on fiber development before 15 DPA. From 5 to 30 DPA, the overall transcript levels of GhGAUT08 and GhGAUT10 were higher in high fiber-strength materials than those in low strength materials. At 5 DPA, the transcript levels of GhGAUT11 and GhGAUT29 were higher in low fiber-strength materials than those in high strength materials, with the opposite trend from 10 to 30 DPA. The transcript level of GhGAUT25 was higher in low fiber-strength materials than high strength materials from 5 to 10 DPA, with the reverse pattern after 15 DPA. In six high strength materials, peak GAUT expression was from 10 to 15 DPA (Fig. 5), whereas the period of highest expression in six low-strength materials was 5 to 10 DPA (Fig. 5).

Table 2 The FPKM value of GAUT genes in G. arboreum
Table 3 The FPKM value of GAUT genes in G. raimondii
Table 4 The FPKM value of GAUT genes in G. barbadense
Table 5 The FPKM value of GAUT genes in G. hirsutum
Fig. 5
figure 5

Analysis of 5 GhGAUT transcript levels in 6 high-strength fiber materials and 6 low-strength fiber materials. (a:0–153, b:69262, c:69272, d:69328, e:69327, f:69307, g: sGK9708, h:69312, i:69404, j:69305, k:69412, l:69362)

Since the GAUT gene family affect the synthesis of pectin, we measured the pectin content of different materials. The results showed that the peak pectin content of high strength fiber materials appeared at 15 DPA, while the low strength materials appeared at 10 DPA (Fig. 6). This result was similar to the pattern of gene transcription.

Fig. 6
figure 6

The pectin content of different samples. a: 5 high-strength fiber materials; b: 5 low-strength fiber materials

Analysis of the cis-elements in the promoter regions GAUT genes

To investigate the potential reasons for different expression patterns among GAUT genes, we analyzed the promoter region of the GAUT gene family in upland cotton. This analysis was performed because cis-elements (Fig. 7) can affect gene expression (Higo et al. 1999). The 32 cis-elements were detected in the promoter region of the 41 GAUT genes in upland cotton. The 32 cis-elements contained motifs that belonged to the regulatory sequences that responsive to anaerobicity, abiotic stress and hormone signaling. CGTCA and TGACG elements are associated with responsiveness to methyl jasmonate (Me-JA) (Basyuni et al. 2018), while the GARE-motif (TCTGTTG) and P-box (CCTTTTG) are related to gibberellin (GA) (Porto et al. 2014). The TCA-element (CCATCTTTTT) is a salicylic acid (SA) responsive element (Herrera-Vásquez et al. 2015). AuxRR-core (GGTCCAT) and the TGA-element (AACGAC) are related to the responsiveness to auxin (Herrera-Vásquez et al. 2015), and the ABRE-motif (ACGTG) is associated with the abscisic acid (ABA) response (Mishra et al. 2014). The ARE-motif (AAACCA) is related to the anaerobic environment, while the LTR-motif (CCGAAA) participates in response to low temperature (Chen et al. 2018). Finally, the TC-rich repeats (ATTCTCTAAC) is the response element associated with stress (Wei et al. 2009).

Fig. 7
figure 7

Analysis of cis-elements in GhGAUT genes. 32 cis-elements in upland cotton genes were analyzed, namely, elements responsive to anaerobic conditions, different hormones (Me-JA, GA, SA, ABA and IAA), and abiotic stress conditions, including drought, low temperature and other stress conditions

Transcription analysis of prominent fiber-expressed genes under abiotic stress and phytohormone treatments

For a more in-depth study of GhGAUTs transcript levels induced by abiotic stress, the transcription patterns of five GhGAUT genes in cotton with NaCl, PEG, abscisic acid (ABA), naphthylacetic acid (NAA), salicylic acid (SA), and methyl jasmonate (MeJA) treatments were analyzed by qRT-PCR (Fig. 8). We examined the effects of various hormones on the transcription of the five GhGAUT genes. We observed that within 1 h after all treatments, the relative transcript levels of these five genes were rapidly increased and then decreased after 24 h. The peak transcription levels of the up-regulated gene came between 3 h and 12 h, except for GhGAUT11. GhGAUT11 did not respond to the treatment of three hormones ABA, SA, and MeJA. GhGAUT29 responded to all stresses and hormone treatments, and the detected transcript levels were higher. GhGAUT08 responded to ABA and SA treatments with the higher transcript levels. GhGAUT25 also responded to two stresses and four hormone treatments, but the transcript levels were the highest under the treatment of PEG and ABA, and the response peaks came at 6 h and 12 h after treatments, respectively. GhGAUT10 also responded to the treatments of ABA, SA, and MeJA, with the peak transcript levels at 6 h, 12 h, and 12 h; the response levels to NACl, PEG and NAA treatment was low at 3 h and 6 h, respectively. These results indicated that after different hormone treatments, different genes had different response times and response patterns, which were closely related to their hormone response elements and expression patterns.

Fig. 8
figure 8

Analysis of GhGAUTs transcript levels with abiotic stress and phytohormone treatments according to a qRT-PCR assay. a: 200 μmol·L1 NaCl treatment; b: 20% PEG 6000 treatment; c: 100 μmol·L–1 NAA treatment; d: 100 μmol·L–1 ABA treatment; e: 100 μmol·L–1 SA treatment; f: 100 μmol·L–1 MeJA treatment


Phylogenetic analysis of the GAUT gene family

Researches on the GAUT gene family currently focused on Arabidopsis (Sterling et al. 2006; Cantarel et al. 2008), tomato (Godoy et al. 2013; Vasco et al. 2011), ramie (Chen et al. 2014), sweet cherry (Campoy et al. 2015), and the other species. Studies have also been carried out in cotton, but systematically evolutionary analysis has not yet been conducted. Among 131 GAUT members in cotton, 20 are homologous to AtGAUT6; no gene in cotton is homologous to AtGAUT14, and no AtGAUT5 homolog was present in G. hirsutum.

The GAUT gene family encodes galacturonosyltransferases involved in the synthesis of pectin, a compound related to cell wall formation. The conserved domain of the GAUT protein is H-DN-A-SVV-S-V-H-T-F. In tomato, GAUT1 protein had been predicted to be a membrane protein with a single N-terminal transmembrane helix and a major globular domain in the Golgi cavity, however, there was no report about the function of pectin synthesis and its role in fiber development. Analyses of the GAUT gene family have shown that all members share a conserved glycosyltransferase family 8 domain (Godoy et al. 2013; Vasco et al. 2011). In addition, GAUT proteins in the four species of cotton have a PLN02769 domain. 14 GAUT genes are homologous to GAUT7, two of which are present in G.raimondii, and 4 in other three species.

The possible role of the GAUT gene family in fiber development

The formation of bast fiber is an important quality trait in ramie. The development of ramie fiber affecting the rate of hemp formation and ultimately the economic value of ramie directly (Chen et al. 2014). Analysis of the cDNA sequence of ramie GalAT, a key pectin biosynthetic homolog of GAUT4, has showed that most GAUT4 transcript accumulates in roots, followed by leaves, phloem, and xylem (Liu et al. 2009). Similar to the situation in hemp, the development of cotton fiber cells is the main factor affecting cotton fiber quality (Haigler et al. 2012). GbGAUT1, a high galacturonic acid (HG) GAUT protein contains a conserved glycosyl transferase family 8 domain, falls into group GAUT-A in the phylogenetic tree and is preferentially expressed during fiber secondary cell wall thickening, especially at 35 DPA. These results indicate that the GbGAUT1 gene may play an important role in fiber development (Chi et al. 2009).

In our study, the peak expression of GhGAUT genes was concentrated before 15 DPA. In six high-strength-fiber materials, the highest expression was from 10 to 15 DPA, when the periods of fiber secondary wall thickening and fiber elongation overlap (Ji 2011). In the low-strength lines, the expression of GhGAUT genes was concentrated between 5 to 10 DPA, which corresponds to the fiber elongation period (Fan 2013).

Relationship between pectin substances and cotton fiber quality

As is well known, cotton fiber cells develop from a single cell, and their main components are cellulosic materials and pectin substances (Wang 2012). Pectin synthesis and decomposition affect cotton fiber strength (Fan 2013). Pectin methylesterase (PME), a common enzyme in plants that is related to cell wall structure, participates in pectin decomposition and catalyzes pectin deesterification to produce pectate and methanol (Fan 2013; Li et al. 2016). According to previous studies, PME levels increased during fiber development, with the lowest enzyme activity found in high-strength-fiber cultivars (Fan 2013; Li 2016). By comparing multiple transcriptome datasets, Guo et al. (2007) identified two genes related to pectin esterase, which is involved in the hydrolysis of pectin to gelatinic acid, and the expression of these genes were sharply upregulated starting at 12 DPA. In addition, two enzymes involved in pectin synthesis, UDP-glucose 6-dehydrogenase and UDP-D-glucuronic acid 4-epimerase, were down-regulated during secondary wall synthesis. The anthors found that the amount of pectin was decreased in cells at the late stage of cotton fiber development (Gou et al. 2007). Research based on immunohistochemistry has revealed that unesterified homogalacturonan is sparse in epidermal cells, which do not develop into fibers, whereas this compound is abundant in elongated cotton fiber cells (Zhao et al. 2012). The above-mentioned observations suggest that pectin synthesis affects early fiber development, and the hydrolysis of pectin is related to the formation of fibers during the later stage.


In this study, we characterized the GAUT gene family associated with pectin synthesis by analyzing their phylogenetic relationships, conserved motifs, gene structures, promoter sequences, and expression in cotton lines having different fiber strengths. Comprehensive expression and bioinformatics analysis indicated that the peak expression of GhGAUT genes was concentrated before 15 DPA. Gene expression in the six materials with high-strength fiber were concentrated between 10 and 15 DPA, corresponding to the beginning of the fiber secondary wall thickening period and part of the fiber elongation and thickening phase. In contrast, GAUT gene expression in six materials with low-strength fiber was at the highest level from 5 to 10 DPA, which was the fiber elongation period. The result lays the foundation for future researches associated with pectin synthesis during cotton fiber development.


Identification of cotton GAUT family members

To identify all homologous GAUT gene family in Arabidopsis (Sterling et al. 2006), we used a Hidden Markov Model integrated in HMMER 3.0 (Finn et al. 2011) to search for the glycosyl transferase family 83 (PF01501) in the following species: Carica papaya L. (Tang et al. 2008) (, Theobroma cacao (Argout et al. 2011), G. raimondii (Wang et al. 2012) (, Oryza sativa (Du et al. 2017) (, A. thaliana (Cao et al. 2011;) (, G. barbadense (Liu et al. 2015) (, G. hirsutum (Li et al. 2015; Zhang et al. 2015) (, and G. arboreum (Du et al. 2018) ( The sequences were verified using BLASTp with AtGAUT sequences as the query sequences and an e-value threshold of 1 × 10− 5. After multiple sequence alignment in ClustalW program (Thompson et al. 2003), incomplete sequences were manually deleted.

Phylogenetic tree construction, analysis of gene structure and localization

A neighbor-joining phylogenetic tree was constructed from the aligned sequences using MEGA v6.06 with 1 000 bootstrap repeats. To confirm GAUT gene structure in Gossypium species, information about the exons and introns of GAUT genes was retrieved from the GFF3 file, and the exon/intron structures were visualized using the Gene Structure Display Server 2.0 (Hu et al. 2015). Conserved domains were predicted using a conserved domain database ( (Marchler et al. 2010). Motifs were explored using the MEME program ( (Bailey et al. 2006), with the maximum number of motifs set to 20. Analysis and visualization of the chromosomal distribution of the GAUT genes were carried out using MapChart 2.2 (Voorrips 2002).

Promoter region and collinearity analysis of four cotton species

The promoter region of 41 GhGAUT genes were analyzed. Analysis of cis-acting elements was carried out using the PlantCARE database ( (Lescot et al. 2002), The 32 cis-elements were detected in the promoter region of the 41 GAUT genes in upland cotton. The 32 cis-elements contain motifs that have characteristic of regulatory sequences in response to anaerobicity, abiotic stress and hormone signaling. Repetitive elements in the GAUT gene family were identified by collinear analysis using the entire BLAST array (e-value = 1 × 10− 5) in the MCScan (Tang et al. 2008).

Plant materials

Plants were grown using standard field management practices in Anyang, China. One hundred and ninty-six RIL populations were constructed with 0-153 and 9708 as parents, and 5 lines with high fiber strength and 5 lines with low fiber strength were screened. 0-153 were a high-value parents in fiber characters and 9708 were a low-value parents. Plant materials consisted six high-fiber-strength lines and six low-fiber-strength lines from a RILs populations (Sun et al. 2012). The date of flowering was recorded as 0 days post-anthesis (0 DPA). Cotton bolls were sampled every 5 days from 0 to 30 DPA. After collection, fibers were separated from the cotton bolls with a sterile knife, immediately froze in liquid nitrogen, and stored at 80 °C, another part of fiber samples were dried at 45 °C for determination of pectin content. RNA was extracted from cotton fiber tissues, reverse-transcribed into cDNA. The cDNA was stored at –20 °C for subsequent qRT-PCR experiments with three biological replicates (Tuttle et al. 2015).

Transcriptome analyses and qRT-PCR

The transcriptome data were downloaded from the Sequence Read Archive (SRA) of the NCBI database ( (Paterson et al. 2012). The SRA data for G. raimondii (PRJNA79005; Wang et al. 2012), G. barbadense (PRJNA251673; Liu et al. 2015), G. hirsutum (PRJNA248163; Zhang et al. 2015), and G. arboreum (PRJNA179447; Du et al. 2018) were converted to FASTQ format data with the SRA Toolkit, and then analyzed and filtered with the FASTX-Toolkit ( TopHat2 (Kim et al. 2013) was used to map the clean data to the index genomes constructed by Bowtie2 (Langmead and Salzberg 2012) with the library-type and fr-unstranded parameters. The Cufflinks program was used to calculate the transcript levels of the genes in the reference genome (Trapnell et al. 2010). Gene transcript levels (Tables 2,3,4, and 5) were visualized using a normalization method based on log2 (FPKM + 1) in the pheatmap ( qRT-PCR assays of selected genes were performed using specific designed primers (Table 6) on a Roche 480 II PCR system. Gene transcript levels were calculated using the 2-ΔΔCt method, and the experimental design included three biological replicates and three technical replicates (Livak and Schmittgen 2001; Pfaffl 2001).

Table 6 Details of primers for quantitative real-time PCR

Availability of data and materials

All data generated or analyzed during this study are included in this published article and its supplementary information files.









Days post-anthesis




Rhamnogalacturonan II





Glycosyltransferase 8


Recombinant inbred lines


Quantitative trait loci


Differentially expressed genes


Protein-like starch starters


Galactitol synthase


Fragments per kilobase of transcript per million fragments mapped


Quantitative real time polymerase chain reaction


Reverse-transcription PCR


quantitative real-time PCR


Sequence Read Archive


Abscisic acid


Naphthylacetic acid


Salicylic acid


Methyl jasmonate






Indole acetic acid


Download references


Not applicable.


This work was supported by the Major Research Plan of National Natural Science Foundation of China (NO.31690093); the National Agricultural Science and Technology Innovation project for CAAS (CAAS-ASTIP-2016-ICR), and the Central Level of the Scientific Research Institutes for Basic R & D Special Fund Business (Y2017PT51). The funder has no role in the design of the study and collection, analysis, or interpretation of data and writing of the manuscript.

Author information

Authors and Affiliations



Yuan YL and Shang HH conceived and managed the project. Fan SM coordinated the overall project and wrote the manuscript. Liu AY prepared plant materials and extracted RNA, and Zou XY completed the qRT-PCR assay. Zhang Z contributed to the multiple sequence alignments and the phylogenetic analysis. Ge Q helped analyze the data. Gong WK analyzed domains and predicted the promoters. Li JW and Gong JW wrote scripts. Shi YZ contributed to the chromosomal localization and gene structural analysis. Deng XY and Jia TT discussed the results and commented on the manuscript. All authors read, edited, and approved the final version of the manuscript.

Corresponding authors

Correspondence to YUAN Youlu or SHANG Haihong.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare no conflict of interest.

Supplementary Information

Additional file 1 Fig. S1

: The protein sequences of 187 GAUT genes from eight species: Gossypium hirsutum, Gossypium arboreum, Gossypium barbadense, Gossypium raimondii, Arabidopsis thaliana, Oryza sativa, Carica papaya, and Theobroma cacao.

Additional file 2 Fig. S2:

Logos of 20 motifs for four cotton species according to the MEME suite.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

FAN, S., LIU, A., ZOU, X. et al. Evolution of pectin synthesis relevant galacturonosyltransferase gene family and its expression during cotton fiber development. J Cotton Res 4, 22 (2021).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:


  • Cotton fiber development
  • Pectin
  • Galacturonosyltransferases
  • Evolution
  • Transcription patterns