Evolution of pectin-synthesis-relevant galacturonosyltransferase gene family and its expression during cotton ber development

Background: Pectin is a key substance involved in cell wall development, and the galacturonosyltransferases (GAUTs) gene family is a critical participant in the pectin synthesis pathway. Systematic and comprehensive research on GAUTs has not been performed in cotton. Analysis of the evolution and expression patterns of the GAUT gene family in different cotton species is needed to increase knowledge of the function of pectin in cotton ber development. Results: In this study, we identied 131 GAUT genes in the genomes of four Gossypium species (G. raimondii, G. barbadense, G. hirsutum, and G. arboreum), and classied them as GAUT-A, GAUT-B and GAUT-C. Among them, 15 GAUT genes encoded proteins (GAUT1 to GAUT15). All GAUT family genes except for the gene GAUT7 coding contained a consevrved Glyco_transf_8 domain (H-DN-A-SVV-S-V-H-T-F). The consevrved sequences of GAUT7 was a PLN (phospholamban) 02769 domain, categorized as a probable galacturonosyltransferase. According to cis-elemet analysis, GAUT genes expression may be regulated by hormones such as JA, GA, SA, ABA, Me-JA and IAA. The evolution and expression patterns of the GAUT gene family in different cotton species and the expression levels in upland cotton materials having different ber strengths were analyzed. Peak expression of GhGAUT genes was observed before 15DPA; in the six materials with high ber strength, the expression was concentrated from 10 to 15DPA; while the highest expression in low ber strength materials was detected between 5 and 10 DPA. The results presented in this paper lays the foundation for future research on gene function during cotton ber development. Conclusions: The GAUT gene family may affect cotton ber development, including ber elongation and ber thickening. In the low-strength-ber lines, GAUTs mainly participate in ber elongation, whereas their major effect on cotton with high-strength ber is related to both elongation

and attached to the elongating polysaccharide chain to form pectin. The synthesis of pectin involves at least 53 different glycosyltransferases localized on the Golgi apparatus (Ridley et al. 2001).
Galacturonosyltransferases (GAUTs), which are partly responsible for pectin biosynthesis, are glycosyltransferase (Harholt et al. 2010). According to evolutionary analysis, they constitute glycosyltransferase family 8 (GT8), GT8 consists of three separate protein classes, classes I and II contain mostly eukaryotic proteins, while class III consists almost entirely of bacterial proteins (Yin et al. 2010). Plant cell-wall-related proteins, including GAUT and GAUT-like (GATL) proteins, are all located in class I (Sterling et al. 2006). The GAUT gene family was rst identi ed by Blast analysis of the Arabidopsis genome, on the basis of structural similarity, 15 GAUT and GATL family members have been classi ed as follows, GAUT1-GAUT7 in GAUT-A; GAUT8-GAUT11 in GAUT-B; and GAUT12-GAUT15 in GAUT-C, All GATL family members are clustered together and constitute a clade closely related to GAUT15. Experiments have shown that GAUT1 is equivalent to HG: GalAT and provides the necessary molecular machinery for HG and pectin synthesis (Sterling et al. 2006). GAUT1 also belongs to GT family 8, and the GAUT1-related superfamily also contains the 10 GAUT-like genes (Cantarel et al. 2008). The guat1 mutation can cause plant dwar ng, reduced cell adhesion, and a 25% reduction in GalA content of leaves (Or la et al. 2005). The GAUT1-GAUT7 Core complex is held together by one or more covalent disul de bonds and other noncovalen tinteraction. GAUT1 is dependent on GAUT7 and remains in the Golgi apparatus, where is considered to be a component of the HG:GalAT t complex that participates in pectin synthesis (Atmodjo et al. 2011). The gaut8 mutant can reduce the adhesion of epidermal cells of the young leaves and the marginal cells of the roots, thereby leading to plant dwar ng (Durand et al. 2009). The gaut11 mutation can reduce the thickness of the seed mucus (Caffall et al. 2009). In the gaut13 gaut14 double mutant, the distribution of pectin in the pollen tube wall is altered, which results in serious defects in pollen tube shape and growth (Wang et al. 2013).
The GAUT family is large, and more than 67 members have been found in tomato (Solanum lycopersicum). The GAUT family member with the highest expression level in tomato is gaut4 (Godoy et al. 2013). In the gaut4 mutant, pectin structure is changed signi cantly, and other fruit traits, such as starch content, fruit yield, and single fruit quality, are also altered, consistent with an observed increase in rmness. In addition, harvest index is signi cantly decreased because of a reduction in fruit weight and To improve the quality of high-ber cotton materials, our research team has been focused on cotton ber development. As discussed aboved, the GAUT gene family has been studied in plants such as Arabidopsis and tomato, et al., but no systematic investigation has been carried out in cotton. In the current study, we systematically and comprehensively analyzed the chromosomal location, structures, and phylogenetic relationships of the GAUT gene family in different Gossypium species. We also focused on the expression of these genes during the ber development.

Identi cation of cotton GAUT genes
Detailed phylogenetic analyses have devided GT8 proteins into two distantly related clades: 1) the GAUT1 (galacturonosyltransferase1)-related family, including the GAUT and GATL proteins, known as galacturonosyltransferase proteins, 2) a group including plant glycogen protein-like starch starters (PGSIPs) and galactitol synthases (GolSs) (Yin et al. 2018). According to the Pfam database and a bioinformatics analysis, all inferred proteins have a Glyco_transf_8-like domain (PF01501), which indicates that the corresponding genes belong to the GAUT gene family (Kikuchi et al. 2003). To identify the GAUT gene in Gossypium species, we identi ed 187 GAUT genes from eight Gossypium species ( Fig  S1), including 131 genes from the following cotton species: G. hirsutum (41 genes), G. barbadense (42 genes), G. arboreum (25 genes) and G. raimondii (23 genes) (Additional le 3). The length range of coding regions in the GAUT gene family ranged from 1,098 to 4,899 bp, and the encoded proteins comprised of 365 to 1,632 amino acids. The length of 32 GAUT genes was less than 3,000 bp; 88 had lengths of 3,000 to 7,000 bp, while the remaining 11 were longer than 7,000 bp ( Fig S2: Table S1).

Phylogenetic analysis and classi cation of the cotton GAUT gene family in cotton
Using published genome sequencing data for eight species, we determined the phylogenetic relationships of GAUT gene family members from multiple species. In a previous study (Sterling et al. 2006), three types of gene sequences, GAUT-A, GAUT-B, and GAUT-C, were found by multiple sequence alignment of the 187 GAUT genes of these eight species to Arabidopsis homologs (Fig 1a). In the present study, we analyzed the GAUT genes of four cotton species, thereby classifying 62 genes into GAUT-A, 38 into GAUT-B, and 31 into GAUT-C (Fig 1b). In addition, GAUT6 had the largest number of homologs, 20, in Gossypium species, and GhA07G1907, identi ed in a previous transcriptome analysis (Zou et al. 2019), was assigned to GAUT6. We found 16 GAUT13 homologs, 14 homologs each of GAUT7, GAUT9, and GAUT11, and 11 homologs each of GAUT2 and GAUT12. The remaining genes had fewer than 10 homologs. No genes homologous to GAUT14 were detected in any of the four cotton species, and no GAUT5 homologs were identi ed in G. hirsutum. Only one homolog each of GAUT5 and GAUT10 were detected, namely, GrGAUT18 and GhGAUT03, respectively. Four homologs each of GAUT2, 4, 6, 7, 9, 12, and 13, three homologs each of GAUT8 and GAUT 11, and two homologs each of GAUT-A genes GAUT1 and GAUT3 were discovered in G. hirsutum.

Analysis of conserved motif and GAUT gene structures
The following motif is conserved in 15 Arabidopsis GAUT genes and their orthologues in cotton: H-DN-A- Consistent with topological predictions, most GAUT genes can encode a type II membrane protein containing a putative transmemberance domain in its hypervariable N-terminal region (Reithmeier et al. 1992). Among the GAUT proteins analyzed in our study, three GAUT proteins (GAUT 3, 4, and 5) which belonging to GAUT-A contained an N-terminal signal peptide rather than a transmemberance domain. The only GAUT gene family members predicted to have no N-terminal transmemberance domain or signal peptide in cotton were GAUT2 genes (Fig 2b). We also found some GAUT1, GAUT3 and GAUT11 genes with the above characteristics. Among 14 GAUT7 genes belonging to group GAUT-A in eight Gossypium species (Fig S2: Table S1), 10 contained the same conserved as PLN02769 domain (Fig 2b), which was assigned to the the category of "Probable Galacturonosyltransferase" (https://www.ncbi.nlm.nih.gov/proteinclusters/?term=PLN02769). The remaining GAUT family members contained a conserved Glyco_transf_8 domain. The prediction motifs of each member are shown in Fig  2C, and speci c structural information is given in Fig S3. Analysis of collinearity and repeating elements in the GAUT gene According to the results of MCScan analysis, there were no tandem repeat elements were present in the GAUT gene family in Gossypium. In G. hirsutum, GhGAUT01 and GhGAUT19, GhGAUT15 and GhGAUT34, GhGAUT16 and GhGAUT36, GhGAUT17 and GhGAUT38, and GhGAUT18 and GhGAUT39 were homologous to the GAUT 3, GAUT 11, GAUT 12, and GAUT 13 respectively, and were also segment repeats. Genes in diploid Gossypium species corresponding to the above repeated genes are shown in Fig  3. According to their relative order in Gossypium, GAUT genes were categorized into ve groups (1 to 5).
With the exception of group 4, which belonged to GAUT-A, all groups were members of GAUT-C.

Analysis of GAUT expression patterns
As determined by collinearity and repetitive element analyses of the above homologous genes in combination with transcriptome data from different ber developmental stages of diploid species, GaGAUT02 had the highest expression at 15DPA but almost no expression during other periods. The other members of group 1 (Fig 3) were also barely expressed in G. raimondii and tetraploid cotton, with fragments per kilobase of transcript per million fragments mapped (FPKM) values of less than 2.0. In group 2, GhGAUT15 and GhGAUT34 had the highest expressions during the late stage (20 to 25DPA) of ber development (Fig 4). Group 3 members GhGAUT16, GhGAUT36, GrGAUT14, and GaGAUT22 were not expressed at all during the ber developmental period. Among group 4 genes, GhGAUT17 and GhGAUT38 had their highest expression during ber developmental from 5 to 10 DPA, and GrGAUT22 and GaGAUT24 had the expression peaks at 0 and 15 DPA, respectively. All members of group 5 except for GaGAUT25 were expressed at 15 DPA, and the remaining genes had almost no expression during ber development. In four cotton specie, six genes had peak FPKM values greater than 40, namely GaGAUT08, GaGAUT12, GaGAUT13, GrGAUT03, GrGAUT18, and GhGAUT25. The expressions peaks of these six genes all occurred before 15 DPA, which suggests that the GAUT gene family play a role in early cotton ber development.
We selected a RIL population containing high-strength ber and low-strength ber lines for quantitative real time polymerase chain reaction (qRT-PCR) analysis (Wang et al. 2014). For this analysis, we selected GhGAUT08 (Gh_A07G1907) (Zou et al. 2019) and GhGAUT25, both belonging to GAUT-A and GhGAUT10, GhGAUT11, and GhGAUT29 belonging to GAUT-B and having FPKM values greater than 10 ( Fig S4: Table  S2). The qRT-PCR analysis revealed that the GAUT gene family has an important in uence on ber development before 15 DPA. From 5 to 30 DPA, the overall expressions of GhGAUT08 and GhGAUT10 were higher in high-strength materials than in low-strength materials. At 5 DPA, the expressions of GhGAUT11 and GhGAUT29 were higher in low-strength materials than in high-strength materials, with the opposite true from 10 to 30 DPA. GhGAUT25 expression was higher in low-strength materials than highstrength materials from 5 to 10 DPA, with the reverse pattern observed after 15DPA. In six high-strength materials, peak GAUT expression was from 10 to 15 DPA (Fig 5), whereas the period of highest expression in six low-strength materials was 5 to 10 DPA (Fig 5).
The GAUT gene family affect the synthesis of pectin, we measured the pectin content of different materials. The results showed that the peak pectin content of high-strength ber materials appeared at 15DPA, while the low-strength materials appeared at 10DPA (Fig 6). This result was similar to the description of gene expressions.

Prediction of cis-elements in GAUT genes promoter regions
To investigate the potential reasons for different expression patterns among GAUT genes, we analyzed the promoter elements of the GAUT gene family in upland cotton. This analysis was performed because cis-elements (Fig 7) can affect gene expression regulation (Higo et al. 1999). We analyzed 32 ciselements in upland cotton genes, namely, elements responsive to anaerobic conditions, different  (Herrera et al. 2015), and the ABRE-motif (ACGTG) is associated with the abscisic acid (ABA) response (Mishra et al. 2014). The ARE-motif (AAACCA) is related to the anaerobic environment, while the LTR-motif (CCGAAA) participates in response to low temperature (Chen et al. 2018). Finally, the TC-rich repeats (ATTCTCTAAC) is the response element associated with stress (Wei et al. 2009). In future work, we plan to further verify the regulation of the above hormones on the GAUT gene family in upland cotton.

Expression analysis of prominent ber-expressed genes under abiotic stress and phytohormone treatments
For a more in-depth study of GhGAUTs expression levels induced by abiotic stress, the expression patterns of four GhGAUT genes after NaCl, PEG, abscisic acid (ABA), naphthylacetic acid (NAA), salicylic acid (SA), and methyl jasmonate (MeJA) treatment were analyzed by qRT-PCR (Fig 8). We examined the effects of various hormones on the expression of the ve GhGAUT genes. We observed that within 1 hour after all treatments, the relative expression levels of these ve genes were rapidly increased and decreased after 24 hours. The peak of the up-regulation response were between 3 hours and 12 hours, except for GhGAUT11. GhGAUT11 did not respond to the treatment of the three hormones for ABA, SA, and MeJA. GhGAUT29 responds to all stress and hormone treatments, and the detected expression level was higher; GhGAUT08 responded to ABA and SA treatment with the higher expression levels. GhGAUT25 also responded to two stresses and four hormone treatments, but the expression levels were the highest under the treatment of PEG and ABA, and the response peaks were at 6h and 12h, respectively. GhGAUT10 also responded to the treatment of ABA, SA, and MeJA, with the peak expression level at 6h, 12h, and 12h; the response level to NACl, PEG and NAA treatment was low, with slight response at 3h and 6h, respectively. The level was low. These results indicated that after different hormone treatments, different genes had different response times and response patterns, which were closely related to their hormone response elements and expression patterns. The formation of bast ber is an important determinant ramie ber quality, with the development of ramie ber affecting the rate of hemp formation and ultimately the value of ramie directly (Chen et al. 2014). Analysis the cDNA sequence of ramie GalAT, a key pectin biosynthetic homologous to GAUT4, has known that most GAUT4 accumulates in roots, followed by leaves, phloem, and xylem (Liu et al. 2009). Similar to the situation in hemp, the development of cotton ber cells in the main factor affecting cotton ber quality (Haigler et al. 2012). GbGAUT1, a high galacturonic acid (HG) GAUT protein containing a conserved GAUT gene family domain, falls into group GAUT-A in the phylogenetic tree and is preferentially expressed during ber secondary cell wall thickening, especially at 35 DPA. These results indicate that the GbGAUT1 gene may play an important role in ber development (Chi et al. 2009).

Discussion
In our study, the peak expression of GhGAUT genes was concentrated before 15 DPA. In the six highstrength-ber materials, the highest expression was from 10 to 15DPA, when the periods of ber secondary wall thickening and ber elongation overlap (Ji. 2011). In the low-strength lines, the expression of GhGAUT genes was concentrated between 5 to 10 DPA, which corresponds to the ber elongation period (Fan. 2013).

Relationship between pectin substances and cotton ber quality
As ia well known, that cotton ber cells develop from a single cell, and their main components are cellulosic materials and pectin substances (Wang. 2012). Pectin synthesis and decomposition affect cotton ber strength (Fan. 2013). Pectin methylesterase (PME), a common enzyme in plants that is related to cell wall structure, participates in pectin decomposition catalyzes pectin deesteri cation to produce pectate and methanol (Fan. 2013. According to previous studies, PME levels increase during ber development, with the lowest enzyme activity found in high-strength-ber strains (Fan. 2013, Li. 2016. By comparing multiple transcriptome datasets, Guo et al. identi ed two genes related to pectin esterase, which is involved in the hydrolysis of pectin into gelatinic acid, and the expression of these genes were sharply upregulated starting at 12 DPA (Gou et al. 2007). In addition, two enzymes involved in pectin synthesis, UDP-glucose 6-dehydrogenase and UDP-D-glucuronic acid 4epimerase, were down-regulated during secondary wall synthesis. The anthors found that the amount and molecular weight of pectin was decreased in cells at the late stage of cotton ber development (Gou et al. 2007). Research based on immunohistochemistry has revealed that unesteri ed homogalacturonan is sparse in epidermal cells, which do not develop into bers, whereas this compound is abundant in elongated cotton ber cells (Zhao et al. 2012). The above observations suggest that pectin synthesis affects early ber development and the hydrolysis of pectin is related to the formation of bers during the later stage.

Conclusions
In this study, we characterized the GAUT galacturonosyltransferase gene family associated with pectin synthesis by analyzing their phylogenetic relationships, conserved motifs, gene structures, promoter sequences, and expression in cotton lines having different ber strengths. Comprehensive expression and bioinformatics analysis indicated that the peak expression of GhGAUT genes was concentrated before 15DPA. Gene expression in the six materials with high-strength ber were concentrated between 10 and 15DPA, the beginning of the ber secondary wall thickening period and also part of the ber elongation and thickening phase. In contrast, GAUT gene expression in the six materials with low-strength ber was highest from 5 to 10 DPA, which was the ber elongation period. The result should lay the foundation for future research associated with pectin synthesis during cotton ber development.

Identi cation of cotton GAUT family members
To identify all homologous GAUT gene family in Arabidopsis (Sterling et al. 2006 Phylogenetic tree construction, analysis of gene structure and localization A neighbor-joining phylogenetic tree was constructed from the aligned sequences using MEGA v6.06 with 1,000 bootstrap repeats. To con rm GAUT gene structure in Gossypium species, information about GAUT gene exons and introns was retrieved from the GFF3 le, and the exon/intron structure was visualized using the Gene Structure Display Server 2.0 (Hu et al. 2014 Promoter region and collinearity analysis of four cotton species A 2,000-bp sequence upstream of the start codon in the upland cotton genomic sequence was extracted. Analysis of cis-acting elements was carried out using the PlantCARE database (http://bioinformatics.psb.ugent.be/webtools/plantcare/html; Lescot et al. 2002). Repetitive elements in the GAUT gene family were identi ed by collinear analysis using the entire BLAST array (e-value=1e-5) in the MCScan (Tang et al. 2008).

Plant materials
Plant were grown using standard eld management practices in Anyang, China. Plant materials consisted of six high-ber-strength lines and six low-ber-strength lines from a RILs populations (Sun et al. 2012). The date of owering was recorded as 0 days post-anthesis (0 DPA). Plant were sampled every 5 days from 0 to 30 DPA. After collection, bers were separated from the cotton bolls with a sterile knife, immediately frozed in liquid nitrogen, and stored at -80 ° C, and ber samples were dried at 45℃ for determination of pectin content, RNA was extracted from cotton ber tissue, and reversed transcribed into cDNA was formed by reverse-transcription PCR (RT-PCR). The resulting cDNA was stored at -20 ° C for subsequent use in qRT-PCR experiments with three biological replicates (Tuttle et al. 2015).

Transcriptome analyses and qRT-PCR
The transcriptome data were downloaded from the Sequence Read Archive (SRA) of the NCBI database fr-unstranded parameters. The Cu inks program was used to calculate the expression levels of the genes in the reference genome (Trapnell et al. 2010). Visualized gene expression levels ( Fig S4: Table S2) were output using a homogenization method based on log 2 (FPKM + 1) in the pheatmap (https://CRAN.Rproject.org/package=pheatmap). qRT-PCR assays of selected genes were performed using speci c designed primers (Fig S5: Table S3) on a Roche 480 II PCR system. Gene expression levels were calculated using the 2 -ΔCt method, and the experimental design included three biological replicates and three technical replicates (Livak et al. 2001;Pfa et al. 2001 Availability of data and materials All data generated or analyzed during this study are included in this published article and its supplementary information les.

Competing interests
The authors declare no con ict of interest. Business (Y2017PT51).The funder has no role in the design of the study and collection, analysis, or interpretation of data and writing of the manuscript.

Authors' contributions
Yuan YL and Shang HH conceived and managed the project. Fan SM coordinated the overall project and wrote the manuscript. Liu AY prepared plant materials and extracted RNA, and Zou XY completed the qRT-PCR assay. Zhang Z contributed to the multiple sequence alignments and the phylogenetic analysis. Ge Q helped analyze the data. Gong WK analyzed domains and predicted the promoters. Li JW and Gong JW wrote scripts. Shi YZ contributed to the chromosomal localization and gene structural analysis. Deng XY and Jia TT discussed the results and commented on the manuscript. All authors read, edited, and approved the current version of the manuscript.