Skip to main content

Evolution of pectin synthesis relevant galacturonosyltransferase gene family and its expression during cotton fiber development



Pectin is a key substance involved in cell wall development, and the galacturonosyltransferases (GAUTs) gene family is a critical participant in the pectin synthesis pathway. Systematic and comprehensive research on GAUTs has not been performed in cotton. Analysis of the evolution and expression patterns of the GAUT gene family in different cotton species is needed to increase knowledge of the function of pectin in cotton fiber development.


In this study, we have identified 131 GAUT genes in the genomes of four Gossypium species (G. raimondii, G. barbadense, G. hirsutum, and G. arboreum), and classified them as GAUT-A, GAUT-B and GAUT-C, which coding probable galacturonosyltransferases. Among them, the GAUT genes encode proteins GAUT1 to GAUT15. All GAUT proteins except for GAUT7 contain a conserved glycosyl transferase family 8 domain (H-DN-A-SVV-S-V-H-T-F). The conserved sequence of GAUT7 is PLN (phospholamban) 02769 domain. According to cis-elemet analysis, GAUT genes transcript levels may be regulated by hormones such as JA, GA, SA, ABA, Me-JA, and IAA. The evolution and transcription patterns of the GAUT gene family in different cotton species and the transcript levels in upland cotton lines with different fiber strength were analyzed. Peak transcript level of GhGAUT genes have been observed before 15 DPA. In the six materials with high fiber strength, the transcription of GhGAUT genes were concentrated from 10 to 15 DPA; while the highest transcript levels in low fiber strength materials were detected between 5 and 10 DPA. These results lays the foundation for future research on gene function during cotton fiber development.


The GAUT gene family may affect cotton fiber development, including fiber elongation and fiber thickening. In the low strength fiber lines, GAUTs mainly participate in fiber elongation, whereas their major effect on cotton with high strength fiber is related to both elongation and thickening.


Cotton, the world’s most important economic fiber crop, is tightly connected to the ecomonic development and human livelihoods. Cotton fiber formation and development is an important factor in cotton breeding (Fan 2013). The cell wall structure of cotton fiber is composed of pectin and cellulose. The synthesis and decomposition of pectin are therefore important factors affecting cotton fiber formation (Basra et al. 1984).

Pectin, which plays a critical role in plant growth and development, is mainly found in the primary and middle layers of plant cell walls, and participates in the formation of plant tissue structure, biotic and abiotic stress processes. Pectin biosynthesis is estimated to require at least 67 transferases including glycosyl-, methyl-, and acetyltransferases (Mohnen et al. 2008). Pectin is also the most complex cell wall polysaccharide (Scheller et al. 2007). The sugar nucleotide reverse transporter transports the sugar nucleotide to the Golgi apparatus under the action of a specific glycosyltransferase. The glycosyl group is then cleaved and attached to the elongating polysaccharide chain to form pectin. The synthesis of pectin involves at least 53 different glycosyltransferases localized on the Golgi apparatus (Ridley et al. 2001).

Only two glycosyltransferase-coding genes are recognized on the basis of previous research in Arabidopsis: homogalacturonan (HG) synthase and glucuronyltransferase involved in rhamnogalacturonanII (RG II) synthesis (Willats et al. 2006). Compared with currently cataloged HG pectin multimerism in plants, much researches have been performed on galacturonyltransferase (HG: α-1,4-D-galacturonosyltransferase, HG:GalAT). The gene encoding HG:GalAT (EC is named galacturonosyltransferase (GAUT) (Mohnen et al. 2008;Willats et al. 2006).

Galacturonosyltransferases (GAUTs), which are partly responsible for pectin biosynthesis, are glycosyltransferase (Harholt et al. 2010). According to evolutionary analysis, they constitute glycosyltransferase family 8 (GT8s). GT8 consists of three separate protein classes, classes I and II contain mostly eukaryotic proteins, while almost the entire class III consists of bacterial proteins (Yin et al. 2010a, b). Plant cell-wall-related proteins, including GAUT and GAUT-like (GATL) proteins, are all located in class I (Sterling et al. 2006). The GAUT gene family was first identified by Blast analysis of the Arabidopsis genome, on the basis of structural similarity, 15 GAUT and GATL family members have been classified as follows, GAUT1-GAUT7 in GAUT-A; GAUT8-GAUT11 in GAUT-B; and GAUT12-GAUT15 in GAUT-C. All GATL family members are clustered together and constitute a clade closely related to GAUT15. Experiments have shown that GAUT1 is equivalent to HG:GalAT, GalAT and the GAUT1-related gene family provides the genetic and biochemical tools required to study the function of these genes in pectin synthesis (Sterling et al. 2006). GAUT1 also belongs to GT8, and the GAUT1-related superfamily also contains 10 GAUT-like genes (Cantarel et al. 2008). The gaut1 mutation can cause plant dwarfing, reduce cell adhesion and a 25% reduction in GalA content of leaves (Orfila et al. 2005). The GAUT1-GAUT7 core complex is held together by one or more covalent disulfide bonds and other noncovalent interaction. GAUT1 is dependent on GAUT7 and remains in the Golgi apparatus, where it is considered to be a component of the HG:GalAT complex that participates in pectin synthesis (Atmodjo et al. 2011). The gaut8 mutation can reduce the adhesion of epidermal cells of the young leaves and the marginal cells of the roots, thereby leading to plant dwarfing (Durand et al. 2009). The gaut11 mutation can reduce the thickness of the seed mucus (Caffall et al. 2009). In the gaut13-gaut14 double mutant, the distribution of pectin in the pollen tube wall is altered, which results in serious defects in pollen tube shape and growth (Wang et al. 2013).

The GAUT family is large, and more than 67 members have been found in tomato (Solanum lycopersicum). The GAUT family member with the highest expressed level in tomato is gaut4 (Godoy et al. 2013). In the gaut4 mutant, pectin structure is changed significantly, and other fruit traits, such as starch content, fruit yield, and single fruit quality, are also altered, consistent with an observed increase in firmness. In addition, harvest index is significantly decreased because of a reduction in fruit weight and number (Godoy et al. 2013). Hyodo et al. (2013) demonstrated that the gaut1 gene had higher transcription level during fruit development, especially in the fruit epidermis, while gaut1 showed higher expression level in the mesocarp, endocarp, septum, locular tissue, and nucleus after the fruit enters maturity.

To improve the quality of high fiber strength cotton materials, we have focused on cotton fiber development. Zhang et al. (2017) used recombinant inbred lines (RILs) to identify 16 stable quantitative trait loci (QTLs) related to fiber strength. In addition, Zou et al. (2019) identified 3 364 candidate genes related to cotton fiber strength, and analyzed the differences in fiber development (5 to 30 DPA) using two extremely different materials in a RIL population. A total of 363 differentially expressed genes (DEGs) comprising 4, 75, 39, 62, and 183 genes at 10, 15, 20, 25, and 30 DPA, respectively, were detected. Among them, 228 (62.6%) genes were upregulated in high-strength materials, while Gh_A07G1907 (homologous to GAUT6) was downregulated at 15 DPA but upregulated at 25 DPA.

As discussed aboved, the GAUT gene family has been studied in plants such as Arabidopsis and tomato etc., but no systematic investigation has been carried out in cotton. In the current study, we systematically and comprehensively analyzed the chromosomal location, structures, and phylogenetic relationships of the GAUT gene family in different Gossypium species. We also focused on the transcription of these genes during the fiber development.


Identification of cotton GAUT genes

Detailed phylogenetic analyses have divided GT8 proteins into two distantly related clades: 1) the GAUT1-related family, including the GAUT and GATL proteins, known as galacturonosyltransferase proteins, and 2) a group including plant glycogen protein-like starch starters (PGSIPs) and galactitol synthases (GolSs) (Yin et al. 2010a, b). According to the Pfam database and a bioinformatics analysis, all inferred proteins have a Glyco_transf_8-like domain (PF01501), which indicates that the corresponding genes belong to the GAUT gene family (Kikuchi et al. 2003). To identify the GAUT gene in Gossypium species, we identified 187 GAUT genes from eight species (Fig. S1), including 131 genes from the following species: G. hirsutum (41 genes), G. barbadense (42 genes), G. arboreum (25 genes) and G. raimondii (23 genes). The length of coding regions in the GAUT gene family ranged from 1 098 to 4 899 bp, and the encoded proteins comprised of 365 to 1 632 amino acids residues. The length of 32 GAUT genes was less than 3 000 bp; 88 genes had lengths of 3 000 to 7 000 bp, while the remaining 11 were longer than 7 000 bp (Table 1).

Table 1 Chromosomal locus ID and length of the encoded galacturonosyltransferase proteins in cotton

Phylogenetic analysis and classification of the GAUT gene family in cotton

Using published genome sequence data from eight species, we determined the phylogenetic relationships of GAUT gene family members among multiple species. In a previous study (Sterling et al. 2006), three types of protein sequences, GAUT-A, GAUT-B, and GAUT-C, were found by multiple sequence alignment of the 187 GAUT proteins of these eight species to Arabidopsis homologs (Fig. 1a). In the present study, we analyzed the GAUT genes of four cotton species, thereby classifying 62 proteins into GAUT-A, 38 into GAUT-B, and 31 into GAUT-C (Fig. 1b). In addition, GhA07G1907 was a differentially expressed gene in the QTL which was related to fiber strength. It was a homologous gene to AtGAUT6. And genes homologous to AtGAUT6 had the most number in Gossypium species (Zou et al. 2019). We found 16 AtGAUT13 homologs, 14 genes were homologous to each of AtGAUT7, AtGAUT9, and AtGAUT11, and 11 genes were homologous to each of AtGAUT2 and AtGAUT12. The remaining genes had fewer than 10 homologs. No gene homologous to AtGAUT14 were detected in any of the four cotton species, and no AtGAUT5 homolog was identified in G. hirsutum. Only one homolog of each of AtGAUT5 and AtGAUT10 were detected, namely, GrGAUT18 and GhGAUT03, respectively. Four homologs each of AtGAUT2, 4, 6, 7, 9, 12, and 13, three homologs each of AtGAUT8 and AtGAUT11, and two homologous genes to each of AtGAUT1, and AtGAUT3 were discovered in G. hirsutum.

Fig. 1

Phylogenetic tree of the galacturonosyltransferase (GAUT) gene family

a. Neighbor-joining phylogenetic tree of galacturonosyltransferase (GAUT) genes from eight species. b: Neighbor-joining phylogenetic tree of galacturonosyltransferase (GAUT) family form the four cotton species. They comparised three types of protein sequences by multiple sequence alignment of the 187 GAUT proteins for eight species. From the four cotton species, GAUT gene family were classified into GAUT-A (62 proteins), GAUT-B (38 proteins), and GAUT-C (31 proteins)

Analysis of conserved motif and GAUT protein structures

The following motif is conserved in 15 Arabidopsis GAUT protein and their orthologues in cotton: H-DN-A-SVV-S-V-H-T-F (H-x (2)-[ILV]-x-[ST]-D-N-[IV]-[IL]-A-[ASTV]-S-V-V-[AIV]-x-S-x-[AIV]-x (2)-[AS]-x (2)-[PS]-x (3)-V-[FL]-H-[ILV]-[ILV]-T-[DN]-x (2)-[NST]-x (2)-[AGP]-[IM]-x (3)-F) (Sterling et al., 2006). The GAUT gene family encode proteins with a molecular mass between 61 and 78 kDa (Sterling et al. 2006; Godoy et al. 2013). Consistent with topological predictions, most GAUT proteins can encode a type II membrane protein containing a putative transmemberance domain in its hypervariable N-terminal region. Among the GAUT proteins analyzed in our study, three GAUT proteins (GAUT 3, 4, and 5) which belongeding to GAUT-A contained an N-terminal signal peptide rather than a transmemberance domain. The only GAUT gene family members predicted to have no N-terminal transmemberance domain or signal peptide in cotton were GAUT2 proteins (Fig. 2b). We also found some GAUT1, GAUT3 and GAUT11 proteins with the above-mentioned characteristics. Among 14 GAUT7 proteins belonging to GAUT-A group in four Gossypium species (Table 1), the proteins encoded by 10 genes homologous to GAUT7 contained the conserved PLN02769 domain (Fig. 2b), which was assigned to the category of “Probable Galacturonosyltransferase” ( The remaining GAUT family members contained a conserved glycosyl transferase family 8 domain. The predicted motifs of each member are shown in Fig. 2C, and the specific structural information is given in Fig. S2.

Fig. 2

The stucture analysis of galacturonosyltransferase (GAUT) gene family. a: Phylogenetic tree of the galacturonosyltransferase (GAUT) gene family. b: Analysis of the genomic sequences of the GAUT gene family domains. c: Analysis of the genomic sequences of the GAUT gene family motifs

Analysis of collinearity and repeating elements in the GAUT gene

According to the results of MCScan analysis, there was no tandem repeat element present in the GAUT gene family in Gossypium. In G. hirsutum, GhGAUT01 and GhGAUT19, GhGAUT15 and GhGAUT34, GhGAUT16 and GhGAUT36, GhGAUT17 and GhGAUT38, and GhGAUT18 and GhGAUT39 were homologous to the AtGAUT3, AtGAUT11, AtGAUT12, and AtGAUT13, respectively, and were also segment repeats. Genes in diploid Gossypium species corresponding to the above repeated genes were shown in Fig. 3. According to their relative order in Gossypium, GAUT genes were categorized into five groups (1 to 5). Only group 4 belongs to GAUT-A, all other groups were members of GAUT-C.

Fig. 3

Collinearity of galacturonosyltransferase genes in G. arboretum, G. hirsutum and G. raimondii

Analysis of GAUTs transcription patterns

As determined by collinearity and repetitive element analyses of the above homologous genes in combination with transcriptome data from different fiber developmental stages of diploid Gossypium species, GaGAUT02 had the highest transcript level at 15 DPA but almost no transcription during other periods. Other members of group 1 (Fig. 3) were also barely transcribed in G. raimondii and tetraploid cotton, with fragments per kilobase of transcript per million fragments mapped (FPKM) values of less than 2.0. In group 2, GhGAUT15 and GhGAUT34 had the highest transcript level during the late stage (20 to 25 DPA) of fiber development (Fig. 4). Group 3 members GhGAUT16, GhGAUT36, GrGAUT14, and GaGAUT22 were not transcribed at all during the fiber developmental period. Among group 4 genes, GhGAUT17 and GhGAUT38 had their highest transcript level during fiber developmental from 5 to 10 DPA, and GrGAUT22 and GaGAUT24 had the peak transcript level at 0 and 15 DPA, respectively. All members of group 5 except for GaGAUT25 were transcribed at 15 DPA, and the remaining genes had almost no transcription during fiber development. In four cotton species, six genes, namely GaGAUT08, GaGAUT12, GaGAUT13, GrGAUT03, GrGAUT18, and GhGAUT25, had peak FPKM values greater than 40. The expression peak of these six genes all occurred before 15 DPA, which suggested that the GAUT gene family play a important role in early cotton fiber development.

Fig. 4

Heatmap  of the galacturonosyltransferase gene transcript levels based on transcriptome analysis. a: galacturonosyltransferase gene transcript levels during fiber development in G. arboreum (0 to 15 DPA). b: Galacturonosyltransferase gene transcript levels during fiber development in G. raimondii (0 to 15 DPA). c: Galacturonosyltransferase gene transcript levels during fiber development in G. arboreum (10 to 28 DPA). d: Galacturonosyltransferase gene transcript levels in the root, stem, leaf, and fiber tissues (5 to 30 DPA) from G. hirsutum

We selected a RIL population containing high-strength fiber and low-strength fiber lines for quantitative real time polymerase chain reaction (qRT-PCR) analysis (Wang et al. 2014). For this analysis, we selected GhGAUT08 (Gh_A07G1907) (Zou et al. 2019) and GhGAUT25 which belonged to GAUT-A, and GhGAUT10, GhGAUT11 and GhGAUT29 which belonged to GAUT-B. And all had FPKM values greater than 10 (Tables 2,3,4, and 5). The qRT-PCR analysis revealed that the GAUT genes had an important influence on fiber development before 15 DPA. From 5 to 30 DPA, the overall transcript levels of GhGAUT08 and GhGAUT10 were higher in high fiber-strength materials than those in low strength materials. At 5 DPA, the transcript levels of GhGAUT11 and GhGAUT29 were higher in low fiber-strength materials than those in high strength materials, with the opposite trend from 10 to 30 DPA. The transcript level of GhGAUT25 was higher in low fiber-strength materials than high strength materials from 5 to 10 DPA, with the reverse pattern after 15 DPA. In six high strength materials, peak GAUT expression was from 10 to 15 DPA (Fig. 5), whereas the period of highest expression in six low-strength materials was 5 to 10 DPA (Fig. 5).

Table 2 The FPKM value of GAUT genes in G. arboreum
Table 3 The FPKM value of GAUT genes in G. raimondii
Table 4 The FPKM value of GAUT genes in G. barbadense
Table 5 The FPKM value of GAUT genes in G. hirsutum
Fig. 5

Analysis of 5 GhGAUT transcript levels in 6 high-strength fiber materials and 6 low-strength fiber materials. (a:0–153, b:69262, c:69272, d:69328, e:69327, f:69307, g: sGK9708, h:69312, i:69404, j:69305, k:69412, l:69362)

Since the GAUT gene family affect the synthesis of pectin, we measured the pectin content of different materials. The results showed that the peak pectin content of high strength fiber materials appeared at 15 DPA, while the low strength materials appeared at 10 DPA (Fig. 6). This result was similar to the pattern of gene transcription.

Fig. 6

The pectin content of different samples. a: 5 high-strength fiber materials; b: 5 low-strength fiber materials

Analysis of the cis-elements in the promoter regions GAUT genes

To investigate the potential reasons for different expression patterns among GAUT genes, we analyzed the promoter region of the GAUT gene family in upland cotton. This analysis was performed because cis-elements (Fig. 7) can affect gene expression (Higo et al. 1999). The 32 cis-elements were detected in the promoter region of the 41 GAUT genes in upland cotton. The 32 cis-elements contained motifs that belonged to the regulatory sequences that responsive to anaerobicity, abiotic stress and hormone signaling. CGTCA and TGACG elements are associated with responsiveness to methyl jasmonate (Me-JA) (Basyuni et al. 2018), while the GARE-motif (TCTGTTG) and P-box (CCTTTTG) are related to gibberellin (GA) (Porto et al. 2014). The TCA-element (CCATCTTTTT) is a salicylic acid (SA) responsive element (Herrera-Vásquez et al. 2015). AuxRR-core (GGTCCAT) and the TGA-element (AACGAC) are related to the responsiveness to auxin (Herrera-Vásquez et al. 2015), and the ABRE-motif (ACGTG) is associated with the abscisic acid (ABA) response (Mishra et al. 2014). The ARE-motif (AAACCA) is related to the anaerobic environment, while the LTR-motif (CCGAAA) participates in response to low temperature (Chen et al. 2018). Finally, the TC-rich repeats (ATTCTCTAAC) is the response element associated with stress (Wei et al. 2009).

Fig. 7

Analysis of cis-elements in GhGAUT genes. 32 cis-elements in upland cotton genes were analyzed, namely, elements responsive to anaerobic conditions, different hormones (Me-JA, GA, SA, ABA and IAA), and abiotic stress conditions, including drought, low temperature and other stress conditions

Transcription analysis of prominent fiber-expressed genes under abiotic stress and phytohormone treatments

For a more in-depth study of GhGAUTs transcript levels induced by abiotic stress, the transcription patterns of five GhGAUT genes in cotton with NaCl, PEG, abscisic acid (ABA), naphthylacetic acid (NAA), salicylic acid (SA), and methyl jasmonate (MeJA) treatments were analyzed by qRT-PCR (Fig. 8). We examined the effects of various hormones on the transcription of the five GhGAUT genes. We observed that within 1 h after all treatments, the relative transcript levels of these five genes were rapidly increased and then decreased after 24 h. The peak transcription levels of the up-regulated gene came between 3 h and 12 h, except for GhGAUT11. GhGAUT11 did not respond to the treatment of three hormones ABA, SA, and MeJA. GhGAUT29 responded to all stresses and hormone treatments, and the detected transcript levels were higher. GhGAUT08 responded to ABA and SA treatments with the higher transcript levels. GhGAUT25 also responded to two stresses and four hormone treatments, but the transcript levels were the highest under the treatment of PEG and ABA, and the response peaks came at 6 h and 12 h after treatments, respectively. GhGAUT10 also responded to the treatments of ABA, SA, and MeJA, with the peak transcript levels at 6 h, 12 h, and 12 h; the response levels to NACl, PEG and NAA treatment was low at 3 h and 6 h, respectively. These results indicated that after different hormone treatments, different genes had different response times and response patterns, which were closely related to their hormone response elements and expression patterns.

Fig. 8

Analysis of GhGAUTs transcript levels with abiotic stress and phytohormone treatments according to a qRT-PCR assay. a: 200 μmol·L1 NaCl treatment; b: 20% PEG 6000 treatment; c: 100 μmol·L–1 NAA treatment; d: 100 μmol·L–1 ABA treatment; e: 100 μmol·L–1 SA treatment; f: 100 μmol·L–1 MeJA treatment


Phylogenetic analysis of the GAUT gene family

Researches on the GAUT gene family currently focused on Arabidopsis (Sterling et al. 2006; Cantarel et al. 2008), tomato (Godoy et al. 2013; Vasco et al. 2011), ramie (Chen et al. 2014), sweet cherry (Campoy et al. 2015), and the other species. Studies have also been carried out in cotton, but systematically evolutionary analysis has not yet been conducted. Among 131 GAUT members in cotton, 20 are homologous to AtGAUT6; no gene in cotton is homologous to AtGAUT14, and no AtGAUT5 homolog was present in G. hirsutum.

The GAUT gene family encodes galacturonosyltransferases involved in the synthesis of pectin, a compound related to cell wall formation. The conserved domain of the GAUT protein is H-DN-A-SVV-S-V-H-T-F. In tomato, GAUT1 protein had been predicted to be a membrane protein with a single N-terminal transmembrane helix and a major globular domain in the Golgi cavity, however, there was no report about the function of pectin synthesis and its role in fiber development. Analyses of the GAUT gene family have shown that all members share a conserved glycosyltransferase family 8 domain (Godoy et al. 2013; Vasco et al. 2011). In addition, GAUT proteins in the four species of cotton have a PLN02769 domain. 14 GAUT genes are homologous to GAUT7, two of which are present in G.raimondii, and 4 in other three species.

The possible role of the GAUT gene family in fiber development

The formation of bast fiber is an important quality trait in ramie. The development of ramie fiber affecting the rate of hemp formation and ultimately the economic value of ramie directly (Chen et al. 2014). Analysis of the cDNA sequence of ramie GalAT, a key pectin biosynthetic homolog of GAUT4, has showed that most GAUT4 transcript accumulates in roots, followed by leaves, phloem, and xylem (Liu et al. 2009). Similar to the situation in hemp, the development of cotton fiber cells is the main factor affecting cotton fiber quality (Haigler et al. 2012). GbGAUT1, a high galacturonic acid (HG) GAUT protein contains a conserved glycosyl transferase family 8 domain, falls into group GAUT-A in the phylogenetic tree and is preferentially expressed during fiber secondary cell wall thickening, especially at 35 DPA. These results indicate that the GbGAUT1 gene may play an important role in fiber development (Chi et al. 2009).

In our study, the peak expression of GhGAUT genes was concentrated before 15 DPA. In six high-strength-fiber materials, the highest expression was from 10 to 15 DPA, when the periods of fiber secondary wall thickening and fiber elongation overlap (Ji 2011). In the low-strength lines, the expression of GhGAUT genes was concentrated between 5 to 10 DPA, which corresponds to the fiber elongation period (Fan 2013).

Relationship between pectin substances and cotton fiber quality

As is well known, cotton fiber cells develop from a single cell, and their main components are cellulosic materials and pectin substances (Wang 2012). Pectin synthesis and decomposition affect cotton fiber strength (Fan 2013). Pectin methylesterase (PME), a common enzyme in plants that is related to cell wall structure, participates in pectin decomposition and catalyzes pectin deesterification to produce pectate and methanol (Fan 2013; Li et al. 2016). According to previous studies, PME levels increased during fiber development, with the lowest enzyme activity found in high-strength-fiber cultivars (Fan 2013; Li 2016). By comparing multiple transcriptome datasets, Guo et al. (2007) identified two genes related to pectin esterase, which is involved in the hydrolysis of pectin to gelatinic acid, and the expression of these genes were sharply upregulated starting at 12 DPA. In addition, two enzymes involved in pectin synthesis, UDP-glucose 6-dehydrogenase and UDP-D-glucuronic acid 4-epimerase, were down-regulated during secondary wall synthesis. The anthors found that the amount of pectin was decreased in cells at the late stage of cotton fiber development (Gou et al. 2007). Research based on immunohistochemistry has revealed that unesterified homogalacturonan is sparse in epidermal cells, which do not develop into fibers, whereas this compound is abundant in elongated cotton fiber cells (Zhao et al. 2012). The above-mentioned observations suggest that pectin synthesis affects early fiber development, and the hydrolysis of pectin is related to the formation of fibers during the later stage.


In this study, we characterized the GAUT gene family associated with pectin synthesis by analyzing their phylogenetic relationships, conserved motifs, gene structures, promoter sequences, and expression in cotton lines having different fiber strengths. Comprehensive expression and bioinformatics analysis indicated that the peak expression of GhGAUT genes was concentrated before 15 DPA. Gene expression in the six materials with high-strength fiber were concentrated between 10 and 15 DPA, corresponding to the beginning of the fiber secondary wall thickening period and part of the fiber elongation and thickening phase. In contrast, GAUT gene expression in six materials with low-strength fiber was at the highest level from 5 to 10 DPA, which was the fiber elongation period. The result lays the foundation for future researches associated with pectin synthesis during cotton fiber development.


Identification of cotton GAUT family members

To identify all homologous GAUT gene family in Arabidopsis (Sterling et al. 2006), we used a Hidden Markov Model integrated in HMMER 3.0 (Finn et al. 2011) to search for the glycosyl transferase family 83 (PF01501) in the following species: Carica papaya L. (Tang et al. 2008) (, Theobroma cacao (Argout et al. 2011), G. raimondii (Wang et al. 2012) (, Oryza sativa (Du et al. 2017) (, A. thaliana (Cao et al. 2011;) (, G. barbadense (Liu et al. 2015) (, G. hirsutum (Li et al. 2015; Zhang et al. 2015) (, and G. arboreum (Du et al. 2018) ( The sequences were verified using BLASTp with AtGAUT sequences as the query sequences and an e-value threshold of 1 × 10− 5. After multiple sequence alignment in ClustalW program (Thompson et al. 2003), incomplete sequences were manually deleted.

Phylogenetic tree construction, analysis of gene structure and localization

A neighbor-joining phylogenetic tree was constructed from the aligned sequences using MEGA v6.06 with 1 000 bootstrap repeats. To confirm GAUT gene structure in Gossypium species, information about the exons and introns of GAUT genes was retrieved from the GFF3 file, and the exon/intron structures were visualized using the Gene Structure Display Server 2.0 (Hu et al. 2015). Conserved domains were predicted using a conserved domain database ( (Marchler et al. 2010). Motifs were explored using the MEME program ( (Bailey et al. 2006), with the maximum number of motifs set to 20. Analysis and visualization of the chromosomal distribution of the GAUT genes were carried out using MapChart 2.2 (Voorrips 2002).

Promoter region and collinearity analysis of four cotton species

The promoter region of 41 GhGAUT genes were analyzed. Analysis of cis-acting elements was carried out using the PlantCARE database ( (Lescot et al. 2002), The 32 cis-elements were detected in the promoter region of the 41 GAUT genes in upland cotton. The 32 cis-elements contain motifs that have characteristic of regulatory sequences in response to anaerobicity, abiotic stress and hormone signaling. Repetitive elements in the GAUT gene family were identified by collinear analysis using the entire BLAST array (e-value = 1 × 10− 5) in the MCScan (Tang et al. 2008).

Plant materials

Plants were grown using standard field management practices in Anyang, China. One hundred and ninty-six RIL populations were constructed with 0-153 and 9708 as parents, and 5 lines with high fiber strength and 5 lines with low fiber strength were screened. 0-153 were a high-value parents in fiber characters and 9708 were a low-value parents. Plant materials consisted six high-fiber-strength lines and six low-fiber-strength lines from a RILs populations (Sun et al. 2012). The date of flowering was recorded as 0 days post-anthesis (0 DPA). Cotton bolls were sampled every 5 days from 0 to 30 DPA. After collection, fibers were separated from the cotton bolls with a sterile knife, immediately froze in liquid nitrogen, and stored at 80 °C, another part of fiber samples were dried at 45 °C for determination of pectin content. RNA was extracted from cotton fiber tissues, reverse-transcribed into cDNA. The cDNA was stored at –20 °C for subsequent qRT-PCR experiments with three biological replicates (Tuttle et al. 2015).

Transcriptome analyses and qRT-PCR

The transcriptome data were downloaded from the Sequence Read Archive (SRA) of the NCBI database ( (Paterson et al. 2012). The SRA data for G. raimondii (PRJNA79005; Wang et al. 2012), G. barbadense (PRJNA251673; Liu et al. 2015), G. hirsutum (PRJNA248163; Zhang et al. 2015), and G. arboreum (PRJNA179447; Du et al. 2018) were converted to FASTQ format data with the SRA Toolkit, and then analyzed and filtered with the FASTX-Toolkit ( TopHat2 (Kim et al. 2013) was used to map the clean data to the index genomes constructed by Bowtie2 (Langmead and Salzberg 2012) with the library-type and fr-unstranded parameters. The Cufflinks program was used to calculate the transcript levels of the genes in the reference genome (Trapnell et al. 2010). Gene transcript levels (Tables 2,3,4, and 5) were visualized using a normalization method based on log2 (FPKM + 1) in the pheatmap ( qRT-PCR assays of selected genes were performed using specific designed primers (Table 6) on a Roche 480 II PCR system. Gene transcript levels were calculated using the 2-ΔΔCt method, and the experimental design included three biological replicates and three technical replicates (Livak and Schmittgen 2001; Pfaffl 2001).

Table 6 Details of primers for quantitative real-time PCR

Availability of data and materials

All data generated or analyzed during this study are included in this published article and its supplementary information files.









Days post-anthesis




Rhamnogalacturonan II





Glycosyltransferase 8


Recombinant inbred lines


Quantitative trait loci


Differentially expressed genes


Protein-like starch starters


Galactitol synthase


Fragments per kilobase of transcript per million fragments mapped


Quantitative real time polymerase chain reaction


Reverse-transcription PCR


quantitative real-time PCR


Sequence Read Archive


Abscisic acid


Naphthylacetic acid


Salicylic acid


Methyl jasmonate






Indole acetic acid


  1. Argout X, Salse J, Aury JM, et al. The genome of Theobroma cacao. Nat Genet. 2011;43(2):101–8.

  2. Atmodjo MA, Sakuragi Y, Zhu X, et al. Galacturonosyltransferase (GAUT) 1 and GAUT7 are the core of a plant cell wall pectin biosynthetic homogalacturonan: galacturonosyltransferase complex. Proc Natl Acad Sci. 2011;108(50):20225–30.

  3. Bailey TL, Williams N, Misleh C, et al. MEME: discovering and analyzing DNA and protein sequence motiefs. Nucleic Acid Res. 2006;34(suppl_2):369–73.

  4. Basyuni M, Wati R, Sulistiyono N, et al. Isolation and analysis of a multifunctional triterpene synthase KcMS promoter region from mangrove plant kandelia candel. IOP Conference Series: Earth and Environmental Science. Medan, Indonesia; 2018, 130(1): 012013. doi:

  5. Caffall KH, Pattathil S, Phillips SE, et al. Arabidopsis thaliana T-DNA mutants implicate GAUT genes in the biosynthesis of pectin and xylan in cell walls and seed testa. Mol Plant. 2009;2(5):1000–14.

  6. Campoy JA, Le Dantec L, Barreneche T, et al. New insights into fruit firmness and weight control in sweet cherry. Plant Mol Biol Report. 2015;33(4):783–96.

    CAS  Article  Google Scholar 

  7. Cantarel BL, Coutinho PM, Rancurel C, et al. The Carbohydrate-Active EnZymes database (CAZy): an expert resource for glycogenomics. Nucleic Acid Res. 2008;37(suppl_1):233–8.

    CAS  Article  Google Scholar 

  8. Cao J, Schneeberger K, Ossowski S, et al. Whole-genome sequencing of multiple Arabidopsis thaliana populations. Nat Genet. 2011;43(10):956–63.

  9. Chen J, Pei Z, Dai L, et al. Transcriptome profiling using pyrosequencing shows genes associated with bast fiber development in ramie (Boehmeria nivea L.). BMC Genomics. 2014;15(1):919.

    Article  PubMed  PubMed Central  Google Scholar 

  10. Chen J, Piao Y, Liu Y, et al. Genome-wide identification and expression analysis of chitinase gene family in Brassica rapa reveals its role in clubroot resistance. Plant Sci. 2018;270:257–67.

  11. Chi JN, Han GY, Wang XF, et al. Isolation and molecular characterization of a novel homogalacturonan galacturonosyl-transferase gene (GbGAUT1) from Gossypium barbadense. Afr J Biotechnol. 2009;8(19):4755–64.

  12. Du HL, Yu Y, Ma YF, et al. Sequencing and de novo assembly of a near complete indica rice genome. Nat Commun. 2017;8(1):15324.

  13. Du XM, Huang G, He SP, et al. Resequencing of 243 diploid cotton accessions based on an updated a genome identifies the genetic basis of key agronomic traits. Nat Genet. 2018;50(6):796–802.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  14. Durand C, Vicré-Gibouin M, Follet-Gueye ML, et al. The organization pattern of root border-like cells of Arabidopsis is dependent on cell wall homogalacturonan. Plant Physiol. 2009;150(3):1411–21.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  15. Fan SM. The relationship between the cellulose synthesis and cotton fiber quality. Kaifeng, China: Henan University; 2013.

  16. Finn RD, Clements J, Eddy SR. HMMER web server: interactive sequence similarity searching. Nucleic Acids Res. 2011;39(suppl_2):29–37.

    CAS  Article  Google Scholar 

  17. Godoy F, Bermúdez L, Lira BS, et al. Galacturonosyltransferase 4 silencing alters pectin composition and carbon partitioning in tomato. J Exp Bot. 2013;64(8):2449–66.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  18. Gou JY, Wang LJ, Chen SP, et al. Gene expression and metabolite profiles of cotton fiber during cell elongation and secondary cell wall synthesis. Cell Res. 2007;17(5):422–3.

  19. Haigler CH, Betancur L, Stiff MR, et al. Cotton fiber: a powerful single-cell model for cell wall and cellulose research. Front Plant Sci. 2012;3:104.

  20. Harholt J, Suttangkakul A, Scheller HV. Biosynthesis of pectin. Plant Physiol. 2010;153(2):384–95.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  21. Herrera-Vásquez A, Salinas P, Holuigue L. Salicylic acid and reactive oxygen species interplay in the transcriptional control of defense genes expression. Front Plant Sci. 2015;6:171.

    Article  PubMed  PubMed Central  Google Scholar 

  22. Higo K, Ugawa Y, Iwamoto M, et al. Plant cis-acting regulatory DNA elements (PLACE) database. Nucleic Acids Res. 1999;27(1):297–300.

  23. Hu B, Jin J, Guo AY, et al. GSDS 2.0: an upgraded gene feature visualization server. Bioinformatics. 2015;31(8):1296–7.

  24. Hyodo H, Terao A, Furukawa J, et al. Tissue specific localization of pectin–Ca2+ cross-linkages and pectin methyl-esterification during fruit ripening in tomato (Solanum lycopersicum). PLoS One. 2013;8(11):e78949.

  25. Ji ZB. Gene expression profile analyses of upland cotton (Gossypium hirsutum) fiber during secondary cell wall deposition. Wuhan, China: Huazhong Agricultural University; 2011.

  26. Kikuchi N, Kwon YD, Gotoh M, et al. Comparison of glycosyltransferase families using the profile hidden Markov model. Biochem Biophys Res Commun. 2003;310(2):574–9.

    CAS  Article  PubMed  Google Scholar 

  27. Kim D, Pertea G, Trapnell C, et al. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 2013;14(4):R36.

  28. Langmead B, Salzberg SL. Fast gapped-read alignment with bowtie 2. Nat Methods. 2012;9(4):357–60.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  29. Lescot M, Déhais P, Thijs G, et al. PlantCARE, a database of plant cis-acting regulatory elements and a portal to tools for in silico analysis of promoter sequences. Nucleic Acids Res. 2002;30(1):325–7.

  30. Li FG, Fan GY, Lu CR, et al. Genome sequence of cultivated upland cotton (Gossypium hirsutum TM-1) provides insights into genome evolution. Nat Biotechnol. 2015;33(5):524–30.

  31. Li WJ. Phylogeny analysis of PMEs gene family and function of GhJAZ in cotton. Kaifeng, China: Henan University; 2016.

  32. Li WJ, Shang HH, Ge Q, et al. Genome-wide identification, phylogeny, and expression analysis of pectin methylesterases reveal their major role in cotton fiber development. BMC Genomics. 2016;17(1):1000.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  33. Liu JX, Yu CM, Tang SW, et al. Cloning and expression of key enzyme gene GalAT in ramie pectin biosynthesis. Agric Sci China. 2009;8(6):664–70.

  34. Liu X, Zhao B, Chen XY, et al. Gossypium barbadense genome sequence provides insight into the evolution of extra-long staple fiber and specialized metabolites. Sci Rep. 2015;5(1):14139.

  35. Livak KJ, Schmittgen TD. Analysis of relative gene expression data using real-time quantitative PCR and the 2-ΔΔCT method. Methods. 2001;25(4):402–8.

  36. Marchler BA, Lu S, Anderson JB, et al. CDD: a Conserved Domain Database for the functional annotation of proteins. Nucleic Acid Res. 2010;39(suppl_1):225–9.

  37. Mishra S, Shukla A, Upadhyay S, et al. Identification, occurrence, and validation of DRE and ABRE cis-regulatory motifs in the promoter regions of genes of Arabidopsis thaliana. J Integr Plant Biol. 2014;56(4):388–99.

  38. Mohnen D. Pectin structure and biosynthesis. Curr Opin Plant Biol. 2008;11(3):266–77.

    CAS  Article  PubMed  Google Scholar 

  39. Orfila C, Sørensen SO, Harholt J, et al. QUASIMODO1 is expressed in vascular tissue of Arabidopsis thaliana inflorescence stems, and affects homogalacturonan and xylan biosynthesis. Planta. 2005;222(4):613–22.

  40. Paterson AH, Wendel JF, Gundlach H, et al. Repeated polyploidization of Gossypium genomes and the evolution of spinnable cotton fibres. Nature. 2012;492(7429):423–7.

  41. Pfaffl MW. A new mathematical model for relative quantification in real-time RT–PCR. Nucleic Acids Res. 2001;29(9):e45.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  42. Porto MS, Pinheiro MP, Batista VG, et al. Plant promoters: an approach of structure and function. Mol Biotechnol. 2014;56(1):38–49.

    CAS  Article  PubMed  Google Scholar 

  43. Ridley BL, O'Neill MA, Mohnen D. Pectins: structure, biosynthesis, and oligogalacturonide-related signaling. Phytochemistry. 2001;57(6):929–67.

    CAS  Article  PubMed  Google Scholar 

  44. Scheller HV, Jensen JK, Sørensen SO, et al. Biosynthesis of pectin. Physiol Plant. 2007;129(2):283–95.

    CAS  Article  Google Scholar 

  45. Sterling JD, Atmodjo MA, Inwood SE, et al. Functional identification of an Arabidopsis pectin biosynthetic homogalacturonan galacturonosyltransferase. Proceed Natl Acad Sci. 2006;103(13):5236–41.

  46. Sun FD, Zhang JH, Wang SF, et al. QTL mapping for fiber quality traits across multiple generations and environments in upland cotton. Mol Breed. 2012;30(1):569–82.

  47. Tang H, Bowers JE, Wang X, et al. Synteny and collinearity in plant genomes. Science. 2008;320(5875):486–8.

  48. Thompson JD, Gibson TJ, Higgins DG. Multiple sequence alignment using Clustal W and Clustal X. Curr Protoc Bioinformatics. 2003;1(1):2–3.

    Article  Google Scholar 

  49. Trapnell C, Williams BA, Pertea G, et al. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol. 2010;28(5):511–5.

  50. Tuttle JR, Nah G, Duke MV, et al. Metabolomic and transcriptomic insights into how cotton fiber transitions to secondary wall synthesis, represses lignification, and prolongs elongation. BMC Genomics. 2015;16(1):477.

  51. Vasco M, De Stefano R, Di Matteo A, et al. Candidate genes controlling fruit quality in a tomato introgression line tolerant to water deficit. In: Proceedings of the Joint Meeting AGI-SIBV-SIGA. Assisi, Italy; 2011. p. 19–22.

  52. Voorrips RE. MapChart: software for the graphical presentation of linkage maps and QTLs. J Hered. 2002;93(1):77–8.

  53. Wang KB, Wang ZW, Li FG, et al. The draft genome of a diploid cotton Gossypium raimondii. Nat Genet. 2012;44(10):1098–103.

  54. Wang L, Wang W, Wang YQ, et al. Arabidopsis galacturonosyltransferase (GAUT) 13 and GAUT14 have redundant functions in pollen tube growth. Mol Plant. 2013;6(4):1131–48.

  55. Wang SG. Construction of forward subtractive library and analysis of differentially expressed genes during secondary cell wall thickening stage of upland cotton (Gossypium hirsutum) fiber. Wuhan, China: Huazhong Agricultural University; 2012.

  56. Wang YB, Li CH, Shang HH, et al. Study of cotton matured fibre quality and the super-molecular structure in upland cotton RILs. Fibres Textiles Eastern Europe. 2014;5(107):28–33.

  57. Wei LY, Zhang DF, Xiang F, et al. Differentially expressed miRNAs potentially involved in the regulation of defense mechanism to drought stress in maize seedlings. Int J Plant Sci. 2009;170(8):979–89.

  58. Willats WG, Knox JP, Mikkelsen JD. Pectin: new insights into an old polymer are starting to gel. Trends Food Sci Technol. 2006;17(3):97–104.

  59. Yin Y, Chen H, Hahn MG, et al. Evolution and function of the plant cell wall synthesis-related glycosyltransferase family 8. Plant Physiol. 2010a;153(4):1729–46.

  60. Yin Y, Mohnen D, Xu Y, et al. Glycosyltransferases of the GT8 family. Ann Plant Rev Plant Polysaccharides Biosynthesis Bioeng. 2010b;41:167–211.

    CAS  Article  Google Scholar 

  61. Zhang TZ, Hu Y, Jiang W, et al. Sequencing of allotetraploid cotton (Gossypium hirsutum L. acc. TM-1) provides a resource for fiber improvement. Nat Biotechnol. 2015;33(5):531–7.

  62. Zhang Z, Ge Q, Liu AY, et al. Construction of a high-density genetic map and its application to QTL identification for fiber strength in upland cotton. Crop Sci. 2017;57(2):774–88.

  63. Zhao L, Ding Q, Zeng J, et al. Immunohistochemical study of cell wall pectin during cotton fiber development. J Chin Electron Microsc Soc. 2012;31(2):179–83 (in Chinese).

  64. Zou XY, Liu AY, Zhang Z, et al. Co-expression network analysis and hub gene selection for high-quality fiber in upland cotton (Gossypium hirsutum) using RNA sequencing analysis. Genes. 2019;10(2):119.

Download references


Not applicable.


This work was supported by the Major Research Plan of National Natural Science Foundation of China (NO.31690093); the National Agricultural Science and Technology Innovation project for CAAS (CAAS-ASTIP-2016-ICR), and the Central Level of the Scientific Research Institutes for Basic R & D Special Fund Business (Y2017PT51). The funder has no role in the design of the study and collection, analysis, or interpretation of data and writing of the manuscript.

Author information




Yuan YL and Shang HH conceived and managed the project. Fan SM coordinated the overall project and wrote the manuscript. Liu AY prepared plant materials and extracted RNA, and Zou XY completed the qRT-PCR assay. Zhang Z contributed to the multiple sequence alignments and the phylogenetic analysis. Ge Q helped analyze the data. Gong WK analyzed domains and predicted the promoters. Li JW and Gong JW wrote scripts. Shi YZ contributed to the chromosomal localization and gene structural analysis. Deng XY and Jia TT discussed the results and commented on the manuscript. All authors read, edited, and approved the final version of the manuscript.

Corresponding authors

Correspondence to YUAN Youlu or SHANG Haihong.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare no conflict of interest.

Supplementary Information

Additional file 1 Fig. S1

: The protein sequences of 187 GAUT genes from eight species: Gossypium hirsutum, Gossypium arboreum, Gossypium barbadense, Gossypium raimondii, Arabidopsis thaliana, Oryza sativa, Carica papaya, and Theobroma cacao.

Additional file 2 Fig. S2:

Logos of 20 motifs for four cotton species according to the MEME suite.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

FAN, S., LIU, A., ZOU, X. et al. Evolution of pectin synthesis relevant galacturonosyltransferase gene family and its expression during cotton fiber development. J Cotton Res 4, 22 (2021).

Download citation


  • Cotton fiber development
  • Pectin
  • Galacturonosyltransferases
  • Evolution
  • Transcription patterns