Global identification of genes associated with xylan biosynthesis in cotton fiber CURRENT STATUS:

Background: Mature cotton fiber secondary wall comprises largely of cellulose (>90%) and small amounts of xylan and lignin. Little is known about the cotton fiber xylan biosynthesis by far. Results: To comprehensively survey biosynthetic enzymes involved in xylan biosynthesis in cotton fiber, the combination of the phylogenetic analysis with expression profile analysis and co-expression analyses allowed us to identify five IRX9, five IRX10, one IRX14, six IRX15, two FRA8, one PARVUS, eight GUX, four GXM, two RWA, two AXY9, 13 TBL genes. In addition, we also identified two GT61 members, two GT47 members, and two DUF579 family members whose homologs in Arabidopsis were not functionally characterized. These 55 genes were regarded as the most probable genes to be involved in fiber xylan biosynthesis. Further experimental validation of one IRX10 like and two FRA8 related genes by complementation analysis indicated that these three genes are able to partially recover the irregular xylem phenotype conferred by the xylan deficiency in the respective Arabidopsis mutant. We presume that these genes are functional orthologs of respective genes that are implicated in GX biosynthesis. Conclusion: The list of 55 cotton genes presented here provides a solid basis to uncover the biosynthesis of xylan in cotton fiber, leading to optimization of the cell wall architecture for fiber improvement.


Background
Mature cotton fiber secondary wall is composed mainly of cellulose (> 90%) and small amounts of noncellulosic polysaccharides, such as xylan and lignin (Haigler et al. 2012;Han et al. 2013). Fiber cell wall composition not only defines fiber morphogenesis but also affects fiber quality and quantity (Haigler et al. 2012). For instance, the xyloglucan endo-transglycosylase/hydrolase (XTH) proteins are capable of degrading xyloglucan irreversibly or cleave and transfer chain ends between molecules.
Transgenic cotton plants overexpressing GhXTH1 exhibited about two-fold higher XET activity and 15 20% longer fiber compared with wild type cotton (Lee et al. 2010). In addition, Avci et al. (2013) found that 14 to 21 days post anthesis (dpa) fibers of Gossypium barbadense barely contain looselybound xyloglucan whereas the Gossypium hirsutum fibers contain lots of xyloglucan via glycome profiling (Avci et al. 2013). Li et al. (2013) even speculated that xyloglucan might affect fiber elongation negatively by comparing xyloglucan content in Gossypium barbadense and Gossypium hirsutum . Moreover, Han et al. (2013) found that overexpression of a LIM-domain protein, WLIM1a, in cotton upregulated expression of the PAL-box genes and resulted in higher amounts of lignin production in transgenic fiber cells, leading to a desirable improvement in both fiber fineness and strength. The authors also proposed that lignin is likely to be a key determinant of cotton fiber quality (Han et al. 2013). Manipulating genes involved in cell wall component biosynthesis can alter fiber quality and quantity and represents a valuable and efficient approach to genetically improve fiber quality, thus it is crucial to understand the structure of fiber cell walls and how different wall components are synthesized. Currently, a number of genes involved in fiber secondary cell wall (SCW) cellulose and lignin synthesis have been identified (Fan et al. 2009;Han et al. 2013;Li et al. 2013). Recent glycome profiling and immunohistochemistry techniques have exhibted the presence of xylans in 10 to 24 dpa cotton fibers in both Gossypium barbadense and Gossypium hirsutum (Avic et al. 2013). Moreover, a gel-state 2D-NMR method even revealed the structure of 4-Omethylglucuronoxylan from the cotton linters (Kim and Ralph, 2014). However, our knowledge of xylan biosynthesis in cotton fiber remains very limited. The function of xylan in fiber development is barely understood. Glucuronoxylan (GX) is a major hemicellulosic component in dicot secondary walls. GX is made of a linear chain of β-1,4-linked xylosyl residues, some of which are substituted with side chains, such as glucuronic acid (GlcA), methylglucuronic acid (MeGlcA), and also decorated with O-acetyl groups (Scheller and Ulvskov, 2010). In addition, dicot xylan has a tetrasaccharide unit β-d-Xyl-(1,3)-α-l-Rha-(1,2)-α-d-GalA-(1,4)-d-Xyl at their reducing ends named reducing end sequence (RES) (Scheller and Ulvskov, 2010). GX synthesis requires the coordinated action of a suite of various enzymes, including glycosyltransferases, acetyl transferases, and methyl transferases. Great progress has been made during the last decade in identifying genes that are involved in the GX biosynthesis mainly in the model plant Arabidopsis. So far 35 genes have been shown to participate in xylan synthesis. Among this set of genes, IRX9/ IRX9L, IRX10/IRX10L, IRX14/IRX14L and possibly IRX15/IRX15L were demonstrated to be required for backbone elongation (Peña et al. 2007;Brown et al. 2007;Brown et al. 2009;Wu et al. 2010;Brown et al. 2011;Jensen et al. 2011;Jensen et al. 2014;Urbanowicz et al. 2014); FRA8/F8H, IRX8 and PARVUS were shown to be responsible for reducing end synthesis (Zhong et al. 2005;Peña et al. 2007;Brown et al. 2007;Lee et al. 2007; Lee et al. 2009a). GUX1 to GUX5 were glucuronyltransferases that add GlcA to xylan (Mortimer et al. 2010;Rennie et al. 2012;Mortimer et al. 2015); GXM1 to GXM3 are methyltransferases catalyzing 4-O-methylation of GlcA side chains on xylan (Lee et al. 2012). Last but not least, ESK1 (TBL29), TBL3, TBL28, TBL30-35, AXY9, and RWA1 to RWA5 have been shown to be implicated in GX acetylation (Lee et al. 2011;Manabe et al. 2011;Urbanowicz et al. 2014;Schultink et al. 2015;Zhong et al. 2017).
A comprehensive understanding of xylan biosynthesis in cotton fiber will require identification of all the genes involved just like what have achieved in Arabidopsis. Recent studies have shown that many proteins responsible for the GX biosynthesis machinery are highly conserved between lower plant such as Physcomitrella and higher plant such as Arabidopsis and Populus (Hörnblad et al. 2013;Zhou et al. 2006;Zhou et al. 2007). We previously characterized two GT43 family members implicated in xylan synthesis in cotton fiber (Li et al. 2014). In this report, we applied a bioinformatics approach aimed at identifying candidate genes as many as possible putatively involved in fiber xylan biosynthesis. Fifty-five genes were shown to be strong candidates responsible for xylan synthesis by phylogenetic analysis and expression pattern analysis. Three of these genes were further demonstrated to function in xylan biosynthesis by complementation analysis. The study of xylan synthesis in cotton fiber with unusual cell wall composition will undoubtedly shed valuable light on manipulating plant cell walls with enhanced productivity and biotechnological potential.

Identification of candidate genes
The recently published G. hirsutum genome sequence databases offer unique opportunities to identify genes involved in xylan biosynthesis in upland cotton (Li et al. 2015;Zhang et al. 2015). The known 35 Arabidopsis proteins involved in xylan synthesis were used to blast search the G. hirsutum genome database [https://www.cottongen.org/, G.hirsutum NBI proteins (70478)]. In addition, some conserved protein domains (such as DUF579, DUF231, Casp1) were also used as a query to blast search the upland cotton genome database. The obtained sequences were sequentially passed through phylogenetic analysis, expression profile analysis and co-expression eanalysis.

Phylogenetic analysis
The multiple sequence alignments were performed using clustal W. A neighborjoining phylogenetic tree of the complete amino acid sequences of homologous proteins in a same GT famlily or containg the same domains was then constructed using MEGA7.0 with 1000 bootstraps. The phylogenetic tree was displayed using iTOL (http://itol.embl.de/) with default settings. When both cotton genes and Arabidopsis genes are within the same subclade of the phylogenetic tree, we suppose these cotton genes to be candidates for functioning in xylan formation, as closely related sequences might have similar catalytic activity.
Expression analyses of cotton fiber xylan synthesis associated genes using RNA-seq data To investigate the spatial and temporal expression profiles of the identified genes, we analyzed their transcript levels using RNA-seq data of 10 different tissues and 9 different ovule and fiber developmental stages, highthroughput RNA-seq data of 13 samples: leaves, roots and stems of 2week-old plants; petals, torus, pistils and stamens; ovules from -3, -1, 0, 1, 3, 5, 10, 20 and 25 dpa; fibers from 5, 10, 20 and 25 dpa were downloaded from http://www.ncbi.nlm.nih.gov/SRA: PRJNA248163 or CottonGen database (https://www.cottongen.org/). Expression level of each gene was calculated using FPKM, those expressions with an FPKM <1 were no longer considered. Because the expression level of the corresponding genes was too low to be relevant and probably are not essential for xylan biosynthesis. Heat maps and hierarchical clustering were generated with Omicshare (http://www.omicshare.com/tools/Home/Soft/heatmap).

Co-expression analysis
Co-expression analysis was performed using a database of co-expression networks with functional modules for diploid and polyploidy Gossypium (http://structuralbiology.cau.edu.cn/gossypium/).

Plant material and growth conditions
Arabidopsis seeds were surface-sterilized and planted as described previously ).

RT-PCR analysis
Total RNA was extracted from Arabidopsis thaliana tissues as previously described . A two-step RT-PCR procedure was performed according to the method described previously (Li et al. 2014)

Complementation of Arabidopsis fra8 and irx10 mutant
The open reading frame of GhGT47A1, GhGT47B1 and GhGT47B2 were amplified by PCR and cloned into pBI121 vector. The resulting constructs were transferred into irx10 and fra8 mutants respectively by floral dip method. Seeds were harvested and stored at 4 ºC. Positive first-generation (T1) transgenic plants were screened on MS medium containing hygromycin. Homozygous transgenic plants were selected for further analysis.

Sectioning of stems and roots and microscopy analysis
Freehand cross-sections of six-week-old stems and roots were conducted as previously described (Li et al. 2014). For lignin staining, sections were stained with phloroglucinol-HCl as described previously (Huang et al. 2016).

Identification of fiber xylan-synthesis related genes
In Arabidopsis, members of glycosyltransferase family 43 (GT43) and GT47 are responsible for synthesis of the xylan backbone: IRX9 and IRX14 from GT43 and IRX10 from GT47 (Brown et al. 2007). Each of them has a single partially redundant paralog: IRX9L, IRX14L and IRX10L, respectively (Brown et al. 2009;Wu et al. 2010). By far only IRX10 (L) have been demonstrated to have β-(1,4)xylosyltransferase activity in vitro (Urbanowicz et al. 2014;Jensen et al. 2014). While IRX9 (L) and/or IRX14 (L) were proposed to act as accessory proteins and play primarily structural roles in xylan synthase complex (XSC) (Zeng et al. 2016).
We first focus on GT43 family. The GT43 family contains 18 members in cotton genome (Fig. 1a). A phylogenetic analysis of GT43 family members showed that the tree was divided into four subclades, in which Gh_A09G1418, Gh_D09G1426 (GhGT43A, Li et al. 2014) were clustered with IRX9 and comprised a subclade, while Gh_D04G2324, Gh_A06G0232, Gh_D02G0864, Gh_A02G0815, Gh_D01G1084 and Gh_A01G1031 were most closely realted to IRX9L and constituted another subclade. IRX14 and IRX14L were grouped with Gh_A08G0365, Gh_D11G1966, Gh_A11G1802, Gh_D03G0229 and Gh_A02G1486 and comprised a subclade. The resting five genes comprised an additional subclade (Fig. 1a). Of the 13 cotton IRX9 and IRX14 genes, two IRX9 genes (Gh_A09G1418 and Gh_D09G1426) and one IRX9L (Gh_A06G0232) were specifically expressed in 20 dpa fibers, two IRX9L genes (Gh_D01G1084 and Gh_D06G2324) were predominantly expressed in 20 dpa fibers ( Fig. 2a, Additional file 1: Fig. S1). Co-expression analysis revealed that, the five 20 dpa fiber specifically or preferentially expressed genes were co-expressed with SCW-associated genes (Additional file 9: Table S1). Thus they represent strong candidates. One (Gh_D03G0229) of five IRX14/IXR14L genes was preferentially expressed in ovules, but during the fiber developmental stages, it was predominantly expressed in 20 dpa fibers and also co-expressed with SCW genes  Table S1).These analyses demonstrated that cotton fiber may also employ IRX9 and IRX14-like genes to synthesize xylan backbone.
Next, we focused on the GT47 family. As shown in Fig. 1b, IRX10 subclade with 5 cotton genes and FRA8 subclade containing 4 cotton genes constitute a clade. Though IRX10 and FRA8 are closely related to each other, they perform completely different fucntions, as IRX10 is involved in xylan backbone synthesis while FRA8 participates in RES biosynthesis in Arabidopsis (Zhong et al. 2005;Brown et al. 2007). These two distinct enzyme activities within the same phylogenetic clade highlight the need for detailed investigation of specific function of every member in the same GT47 subgroup.
Two (Gh_D02G1793 and Gh_A03G1353, thereafter named as GhGT47A1 and GhGT47A2 respectively) of the five IRX10L genes were specifically and most strongly expressed in 20 dpa fibers and another IRX10L gene (Gh_D12G1919) was predominantly transcribed in 20 dpa fibers (Fig. 2a, Additional file 2: Fig. S2). In addition, two IRX10L genes (Gh_A12G1292 and Gh_D12G1414) were preferentially expressed in leaves, but during the fiber developmental stages, they were dominantly expressed in 20 dpa fibers (Fig. 2b, Additional file 2: Fig. S2). Moreover, all of the five IRX10 genes were correlated with SCW associated genes (Fig. 3a, Additional file 9: Table S1). Thus, they represent strong candidates for fiber xylan biosynthesis.
Though the FRA8 subclade contains 4 cotton homologs (Fig. 1b), only two (Gh_A13G2031 and Gh_D13G2434, hereafter named as GhGT47B1 and GhGT47B2 respectively) of them were specifically expressed in 20 dpa fiber and co-expressed with SCW genes (Fig. 2a, Fig. 3b, Additional file 2: Fig. S2, Additional file 9: Table S1). In Arabidopsis, except for IRX10(L) and FRA8(L) in GT47 family, it appears that no other GT47 members function in xylan synthesis. However, we found that Gh_A13G1640 and Gh_D13G2004 located in a different clade within the GT47 tree were not only expressed highly in 20 dpa fibers ( Fig. 2a and 2b, Additional file 2: Fig. S2) but also co-expressed with SCW genes (Additional file 9: Table S1). Due to the lack of functional validation, by now we can't determine whether these two proteins participate in backbone synthesis or RES synthesis. They may represent novel candidates. This finding might reflect the specifity of xylan synthesis in cotton fiber, as compared with Arabidopsis stem.
As for the RES synthesis, apart from FRA8 (L) from GT47, two other members from GT8, IRX8 and PARVUS, were reported to be involved (Lee et al. 2007;Peña et al. 2007). In cotton genome, 115 genes were annotated as GT8 family members (Fig. 1f). A phylogenetic analysis displayed that the tree was divided into three clades. IRX8 together with 13 cotton homologs were in one clade. PARVUS and 15 cotton homologs were in another clade (Fig. 1f). Of the 13 cotton IRX8 genes, none was expressed highly in 20 dpa fibers, thus they are not likely to participate in xylan synthesis in fiber cell  Table S1) and thus regarded as strong candidate. It seems that cotton fiber employs different genes to synthesize the RES or it contains a different RES structure in relative to that of Arabidopsis stem.
In addition, xylan can be modified by the addition of GlcA and MeGlcA side chains by xylan glucuronosyltransferases of the GlcA substitution of xylan (GUX) family which belong to the GT8 family (Mortimer et al. 2010;Rennie et al. 2012;Mortimer et al. 2015). In Arabidopsis, there exist five GUX genes. Although GUX3 and GUX5 are closely related to GUX1, GUX2 and GUX4, only GUX1, GUX2 and GUX4 have xylan α-glucuronosyltransferase activity, while GUX3 is involved in primary cell wall polymer synthesis (Rennie et al. 2012;Mortimer et al. 2015). Within the GT8 phylogenetic tree, the five Arabidopsis GUX proteins and 15 cotton homologs constituting one subclade were in the third clade (Fig. 1f). The GUX subclade was further divided into four subgroups, that is Gh_A10G0734, Gh_D10G1036, Gh_D11G0603, Gh_A11G0519 were clustered with GUX1 and comprised one subclade; Gh_D05G1935, Gh_A05G1740, Gh_A06G1849, Gh_D06G0142 were grouped with GUX2 and constituted another subclade (Fig. 1f). GUX3 subclade contains three cotton proteins, while GUX4/GUX5 subclade contains four cotton protiens (Fig. 1f). It is very interesting to find that genes grouped with GUX1 and GUX2 were all predominantly or highly expressed during fiber secondary thickening stage ( Fig. 2a and 2b, Additional file 6: Fig. S6). Moreover, the eight cotton GUX1/2 related genes all co-expressed with SCW biosynthesis-associated genes (Additional file 9: Table S1). These results seem consistent with what has been shown in Arabidopsis, where GUX1 and GUX2 are SCW specific (Rennie et al. 2012). It should be noted that two cotton GUX3-like genes ( Fig. 1f), Gh_D03G0590 and Gh_D05G2574, also showed very high expression during fiber secondary cell wall thickening stage (Additional file 7: Fig. S7). In addition, a pair of homologous genes (Gh_A13G1420 and Gh_D13G1789) in another subclade next to the GUX subclade also exhibited high expression in 20 dpa fibers (Fig. 1f, Additional file 7: Fig. S7). Though these genes were not co-expressed with SCW genes, they might be plausible candidates of xylan synthesis. This finding is in contrast with GUX3, as it has been shown to participate in primary cell wall xylan biosynthesis (Mortimer et al. 2015).
Mutation of both GUX1 and GUX2 would lead to only acetylated xylan with scarely detectable GlcA modification, indicating that xylan GlcA modification in Arabidopsis was primarily catalyzed by GUX1 and GUX2. However, in cotton fiber, though GUX1 and GUX2-like genes are highly or preferentially expressed in 20 dpa fibers and co-expressed with SCW genes, genes in the GUX3 subclade also showed fairly high expression in 20 dpa fibers suggesting that they may play some roles in xylan modification despite functional verification is needed (Additional file 6: Fig. S6). These results suggest cotton fiber may employ different genes for xylan GlcA modification.
Moreover, the GlcA substituents of xylan are often methylated by glucuronoxylan methyltransferases (GXMs) belonging to the DUF579 family (Lee et al. 2012). This type of methylation occurs exclusively to SCW xylan. The DUF579 family includes 32 members in cotton (Fig. 1c). Phylogenetic analysis showed that Arabidopsis GXMs were closely related to seven cotton homologs and comprised one subclade (Fig. 1c). Four (Gh_D13G1237, Gh_A13G0989, Gh_A01G1163 and Gh_D01G1291) of the seven GXM genes displayed predominant or high expression in 20 dpa fibers and were correlated with SCW genes (Fig. 2a and 2b, Additional file 5: Fig. S5, Additional file 9: Table S1), thus they are likely methyltransferases and their enzymatic activities await further studies.
IRX15 and IRX15L also belong to the DUF579 family. However, it is apparent that they perform different function from that of GXMs. As mutation of GXM genes led to a specific defect in GlcA methylation on xylan while simultaneous mutations of IRX15 and IRX15L caused reduced xylan level and chain length, decreased xylosyltransferase activity without affecting the GlcA methylation of xylan (Brown et al. 2011;Jensen et al. 2011). IRX15/IRX15L and nine cotton homologs constitute another sublcade in the DUF579 family (Fig. 1c). Of the nine cotton IRX15 genes, two (Gh_D12G0677 and Gh_D11G1962) were preferentially and four (Gh_D11G0506, Gh_A11G0434, Gh_A11G1800 and Gh_A12G0675) were highly expressed in 20 dpa fibers and the six genes were all co-expressed with SCW genes ( Fig. 2a and 2b, Additional file 5: Fig. S5, Additional file 9: Table S1). Therefore they represent good candidates for xylan synthesis. Very interestingly, two additional genes (Gh_A03G0453 and Gh_D03G1085) located in a separate subclade were also highly expressed in 20 dpa fibers and co-expressed with SCW-associated genes (Fig. 1c, Fig. 2b, Additional file 5: Fig. S5, additional file 9: Table S1). This implies that Gh_A03G0453 and Gh_D03G1085 may be novel fiber SCW xylan biosynthesis related genes. Whether they have a function similar to GXM or IRX15 awaits further studies.
In addition to the sugar side groups, xylans can have high degrees of acetylation (Scheller and Ulvskov, 2010). So far three different protein families have been shown to be involved in xylan Oacetylation. One is the TRICHOME-BIREFRINGENCE-LIKE ( (Zhong et al. 2017). In rice, 66 TBL genes were identified (Gao et al. 2017), 14 of which show xylan 2-O-and 3-O-acetyltransferase activity. In poplar 64 TBLs have been identified, 12 of which were shown to Oacetylate xylan. In Arabidopsis ESK1/TBL29 appers to be the principle TBL for xylan acetylation in Arabidopsis (Zhong et al. 2017). However, in cotton two ESK1-like genes showed the greatest expression in stems whereas moderate expression in 20 dpa fibers suggesting they might play minor roles in fibers (Fig. 1e, Fig. 2b, Additional file 3: Fig. S3). Additionally, four TBL3-like genes (Gh_D13G0974, Gh_A04G0787, Gh_D04G1278 and Gh_A13G2180) four TBL33 related genes (Gh_D13G0151, Gh_A13G0130, Gh_A06G1419 and Gh_D06G1766 ) and one TBL32-like gene (Gh_A10G2143) were most probable candidates for fiber xylan acetylation as they were not only expressed strongly in 20 dap fibers but also co-expressed with SCW biosynthesis genes (Fig. 1e,   Fig. 2a and 2b, Additional file 3: Fig. S3, Additional file 9: Table S1). Genes in TBL30, TBL31, TBL34 and TBL35 subclades were expressed extremenly lowly in 20 dpa fibers and excluded as candidates (Additional file 3: Fig. S3). It should be noted that the four TBL33 subclade genes showed the highest and specific expression in 20 dpa fibers, suggesting they might be the major enzymes for xylan acetylation (Fig. 1e, Fig. 2a, Additional file 3: Fig. S3). It is also interesting to find that two genes (Gh_A11G2606 and Gh_D11G3388) located outside the TBL subclade were predominantly transcribed in 20 dpa fibers and co-expressed with SCW genes (Fig. 1e, Fig. 2a, Additional file 3: Fig. S3), indicating that they may be new players in SCW xylan-modification. These findings suggest fiber xylan acetylation pattern might be different to that of Arabidopsis stem as the involved genes are different.
It still needs to be investigated whether other TBLs are required for this acetylation process. In summary, in cotton genome we identified 100 TBL genes and found 13 of them were strong candidates of xylan acetyltransferase (Additional file 3: Fig. S3). Although enzymatic activity and specificity of their encoded enzymes are currently lacking.
A second protein family involved in xylan O-acetylation is represented by ALTERED XYLOGLUCAN 9 (AXY9). Contrary to the substrate specificity of the nine TBL proteins, AXY9 seems to be non-specific in polysaccharide O-acetylation as axy9 mutant plants display reductions in the acetylation of both xylan and xyloglucan (Schultink et al. 2015). In addition, biochemical studies showed that recombinant AXY9 did not exhibit any acetyltransferase activity, so it was suggested that AXY9 might act as an intermediate transferring acetyl groups used later by TBL proteins. We identified 6 AXY9 like genes in cotton, only two (Gh_D07G0742 and Gh_A07G0663) of them which displayed 20dpa fiber preferential expression were correlated with SCW genes (Fig. 1e, Fig. 2a). REDUCED WALL O-ACETYLATION (RWA) is the third protein family involved in xylan O-acetylation (Manabe et al. 2011).
There are four RWA genes in Arabidopsis and simultaneous mutations of the four genes result in an overall reduction in the acetylation of both xylan and xyloglucan (Lee et al. 2011). Similarly, RNAisilencing of the four RWA orthologs in hybrid aspen leads to reduced wood xylan and xyloglucan Oacetylation, suggesting the conservative functions of RWA proteins among plant species (Pawar et al. 2017). We identified 8 RWA related genes in cotton (Fig. 1e). Gh_A10G1553, Gh_D10G1803 and Gh_D03G0656 were most closely related to RWA2 whereas RWA1/3/4 together with Gh_D06G2122, Gh_A06G2002, Gh_A05G1263 and Gh_D05G1431 were grouped together (Fig. 1e). Two (Gh_A10G1553 and Gh_D10G1803) of three cotton RWA2 genes were co-expressed with primary cell wall genes, however, they were highly expressed in 20 dpa fibers, thus they could be fiber SCW xylanmodification genes (Fig. 1e, Additional file 4: Fig. S4, Additional file 8: Fig. S8, Additional file 9: Table   S9). Two of the four cotton RWA1/3/4 genes (Gh_A06G2002 and Gh_D06G2122) were predominantly expressed during fiber secondary cell wall and co-expressed with SCW genes (Fig. 1e, Fig. 2a, Additional file 4: Fig. S4), thus they could represent the predominant RWA protein that involved in fiber xylan acetylation. The other two cotton RWA1/3/4 genes (Gh_A05G1263 and Gh_D05G1431) were highly transcribed in 20 dpa fibers but not co-expressed with SCW genes (Fig. 1e, Additional file 4: Fig. S4, Additional file 7: Fig.S7). They could be involved in xylan acetylation. It should be noted that RWAs were proposed to be responsible for the translocation of acetyl-groups across the membrane to supply the substrate for AXY9 and TBL protein families, although no experimental activity has been provided yet (Lee et al. 2011;Manabe et al. 2011;Schultink et al. 2015).
Very interestingly, we identified one gene (Gh_D12G2537) from GT61 famlily that are not only preferentially expressed in 20 dpa fibers but also co-expressed with xylan backbone synthesis genes IRX9 and IRX9L (Fig. 1d, Fig. 2a, Additional file 9: Table S1), thus it may represent a strong candidate involved in xylan modification. Gh_A12G2720 and Gh_D12G2537 is a homoeologous pair in allotetraploid cotton. Gh_A12G2720 was preferentially expressed in stems, but during the fiber developmental stages, it was predominantly expressed in 20 dpa fibers (Fig. 2b). Moreover, it was also co-expressed with SCW related genes (Additional file 9: Table S1), thus it can be considered as a candidate. Gh_D12G2537 was most closely related to OsXAT3, an arabinosyltransferases (XAT) from the GT61 family, responsible for the addition of arabinose residues to the xylan backbone to produce arabinoxylan in rice (Anders et al. 2012). This implies that fiber SCW xylan may be substituted with arabinose, which is in contrast with Arabidopsis that does not contain Arabinoxylan.
Combining all of the evidence from phylogenetic, expression profiling and co-expression analyses, in total we identified 55 genes as most probable fiber xylan biosynthesis genes (Table 1). The Arabidopsis GT mutant complementation approach has been successfully used to explore the functions of several corresponding GT homologs in xylan synthesis in poplar (Zhou et al. 2006;Lee et al. 2009b). We also employed this method to study the functions of GhGT43A and GhGT43C previously (Li et al. 2014). Accordingly, in this study we applied this approach to establish function of candidate genes. We first selected one fiber-preferential gene GhGT47A1 and used the complementation analysis to investigate if GhGT47A has simlar function as its Arabidopsis ortholog IRX10.
We transformed the full-length GhGT47A1 cDNA into the Arabidopsis irx10 mutant to see whether GhGT47A1 can restore the irx10 phenotypes. Transgenic lines were tested for the presence of the GhGT47A1 transgene in a homozygous irx10 background. GhGT47A1 gene-specific primers were used to check the presence of GhGT47A1 mRNA in the transgenic Arabidopsis lines (Fig. 4c). The absence of IRX10 transcript in irx10 mutant and the transgenic line was also confirmed (Fig. 4c). Homozygous irx10 mutant displayed dark-grown leaves, slightly shorter statue. GhGT47A1 expression in irx10 mutant recovered the leaf color and the plant height almost identical to the wild type after six-week growth (Fig. 4b). However, we observed that after transplantation into soil for 3 weeks from one-week old plate-grown seedlings, the inflorescence stem height of irx10 mutant is significant shorter than that of the wild type, whereas the irx10-GhGT47A1 complemented plant show a intermediary height between irx10 mutant and the wild type (Fig. 4a). Most obviously, more than 60% of siliques in irx10 mutant are shrunken and sterile while only 20% of the complemented siliques are sterile (Fig. 4d). It should be noted that the number of siliques was not significantly different.
Cross sections of basal stems from irx10 mutant exhibit a moderate irregular xylem phenotype (Additional file 8: Fig.S8b and 8e). This phenotype was reported previously (Brown et al. 2009). We did not find any apparent difference between the complemented plants and the irx10 mutant, as stems of the GhGT47A1 complemented plants also exhibit a mild irregular xylem phenotype (Additional file 8: Fig.S8c and 8f). However, in sharp contrast with the stem phenotype, the irx10 roots display an obvious irregular xylem phenotype in which most of the xylem vessels are almost completely collapsed and distorted relative to the wild type ( Fig. 4e and 4f). Compared to the irx10 mutant, the shape of root xylem vessels from GhGT47A1 complemented plants is very close to the wild type (Fig. 4e, 4f and 4 g). Meanwhile, compared with the wild type, irx10 showed much weaker red phloroglucinol staining, whereas the GhGT47A1-complemented irx10 showed the intermediary degree of staining, which is weaker than the wild type but stronger than irx10 (Fig. 4h, 4i and 4j).
Collectively, GhGT47A1 expression in Arabidopsis irx10 mutant is able to recover the root irregular xylem phenotype. These results demonstrate that GhGT47A1 can largely complement the xylan deficiency of irx10 mutant which suggest that GhGT47A1 is functionally conserved with IRX10 and performs a similar biochemical function as IRX10. Taken together, our results demonstrate that GhGT47A1 most likely performs a similar catalytic function as IRX10 which has been shown to be involved in the synthesis of GX backbone in Arabidopsis (Brown et al. 2009). So, it is plausible that GhGT47A1 is also involved in GX biosynthesis in cotton fiber.
Both GhGT47B1 (Gh_A13G2031) and GhGT47B2 (Gh_D13G2434) have the ability to partially rescue the fra8 mutant phenotypes Gh_A13G2031 and Gh_D13G2434 is a homoeologous pair in allotetraploid cotton, in which the former located on chromosome A13, and the latter located on its corresponding homoeologous chromosome D13. These two genes were designated as GhGT47B1 and GhGT47B2 respectively hereafter. Their deduced proteins were most closely related to AtFRA8 which has been demonstrated to be involved in RES synthesis (Zhong et al. 2005;Brown et al. 2007). To test whether GhGT47B1 and GhGT47B2 are functionally equivalent to FRA8, we expressed GhGT47B1 and GhGT47B2 individually in the fra8 mutant plants. The fra8 mutant showed a thin and much shorter stature as compared with the wild type (Fig. 5a). It seems that expression of both GhGT47B1 and GhGT47B2 individually in fra8 partially restore the plant size ( Fig. 5a and 5b). Further cross section staining of stems showed that fra8 mutation caused deformation of both xylem vessels and interfascicular fibers in stems ( Fig. 5c and   5d). Though we did not find that the xylem vessels of the stems from these GhGT47B1 or GhGT47B2 complemented fra8 lines became round and open shape, it was found that interfascicular fibers in these complemented lines showed normal shape as those of the wild type ( Fig. 5c-5f). These results showed that both GhGT47B1 and GhGT47B2 are able to partially rescue the fra8 phenotypes conferred by xylan deficiency. These results indicating that both GhGT47B1 and GhGT47B2 are functional orthologs of FRA8 and probably involved in fiber SCW xylan synthesis, although biochemical activity assay is required to determine their exact roles.

Discussion
Mature cotton fiber is composed mainly of cellulose (> 90%) with a small amount of hemicellulose and lignin (Haigler et al. 2012;Kim and Ralph, 2014). It is currently accepted that, when xylan arrives at the cell wall, it coats and crosslinks the cellulose microfibrils, also it makes hydrophobic pockets that facilitate the monolignols polymerization (Wierzbicki et al. 2019). These features of xylan render xylan a critical component of cell wall structure and composition. Xylan has been shown to be required for the normal assembly and mechanical strength of secondary walls. As researches performed in Arabidopsis showed that reduction in xylan content often causes a severe decrease in cellulose deposition, secondary wall thickening and mechanical strength, which suggests that xylan is highly correlated with cellulose synthesis (Brown et al. 2007). However, xylan biosynthesis, a largely unexplored area in cotton fiber, little is known about the genes involved in, except that we previously characterized two GT43 family members which were shown to be functional orthologs of IRX9 and IRX14 respectively (Li et al. 2014). Hence, a full understanding of the genes required for xylan synthesis will shed light on fiber secondary wall synthesis and enable us to genetically improve fibers with altered xylan.
Extensive studies have shown that xylan is synthesized by evolutionarily conserved enzyme machinery in monocots and dicots (Hörnblad et al. 2013). Several studies have identified a number of genes involved in cell wall synthesis from different plant species via a bioinformatics method. For instances, 27 Arabidopsis sequences showing similarity to either the GT-A or the GT-B fold that have not yet been classified in the CZAY database were obtained by adopting a simple bioinformatics approach (Egelund et al. 2004). In addition, all the poplar woodassociated GTs have close homologs in Arabidopsis (Aspeborg et al. 2005). Sequences of annotated cell wallrelated genes of Arabidopsis were used in a BLAST search to find the putative orthologs sequences in rice and maize (Sado et al. 2009). Also, Voxeur et al. (2012) identified 26 genes potentially encoding GT candidate for RG-II synthesis based on coexpression analysis and homologies with GTs of known functions (Voxeur et al. 2012). In this study, in order to gain a full overview of xylan-synthesis related genes in cotton fiber, we applied a similar bioinformatics approach which has already been successfully used to identify GhGT43A and GhGT43C in cotton, and found that the cotton genome nearly contains homologs of all reported genes involved in biosynthesis of GX backbone, side chains and reducing end and GX modification including methylation and acetylation in Arabidopsis. We provide further experimental evidence that GhGT47A1 and GhGT47B1/B2 are able to complement the respective Arabidopsis irx10 and fra8 mutants showing that they might have similar function as their Arabidopsisi counterparts. Future work will be the experimental verification to provide direct evidence of their participation in fiber xylan formation.
Our results indicate that cotton fiber possesses a suit of biosynthetic genes similar to that of Arabidopsis which suggests that molecular machinery underlying xylan structure is largely conserved between cotton fiber and Arabidopsis stem, except that cotton fiber does not necessarily use homologs of IRX8, GUX3/4/5 and TBL28/30/34/35, but prefer to use orthologs of TBL41/43 with unknown function, and novel GT47 and DUF579-domain protein members. Furthermore, two GT61 family members encoding arabinosyltransferase seems to be involved in fiber xylan synthesis. These differences may reflect the specificity of xylan synthesis in cotton fiber.
It is known that proteins involved in xylan backbone elongation physically interact with one another, creating the xylan synthase complex. XSC composition differs among plant kingdom and affects xylan modification (Wierzbicki et al. 2019). Whether cotton fiber employs similar proteins for xylan backbone synthesis is unknown. Whether proteins involved in reducing end synthesis or xylan modification also interact with each other or are part of the XSC still remains elusive. Our findings imply that in addition to a common biosynthetic mechanism of xylan synthesis, cotton fiber may also use specific groups of genes in secondary wall xylan formation. Since cotton fiber has specific cell wall components, it is rational to propose that some GTs or proteins are specifically involved in the generation of fiber-specific cell wall components. Our results also showed that in a specific gene family, some members may be the functional orthologs of xylan-synthesis related enzymes, some members may have divergent functions though they are phylogenetically close to each other. This functional conservation and divergence in xylan biosynthesis also existed in rice family GT43 members since complementation experiments revealed that some rice GT43 family members are able to rescue the irx9 or irx14 mutant phenotype but some are not. One possibility is that this group of members might have evolved to perform different catalytic activities.
Xylan biosynthetic genes can be separated into major and minor functionally redundant pairs, in which pair one gene is functionally dominant in Arabidopsis. IRX9, IRX10, IRX14, and FRA8 make up the major function genes whereas their corresponding paralogs IRX9L, IRX10L, IRX14L, and F8H comprise minor function genes (Brown et al. 2007;Peña et al. 2007;Brown et al. 2009). Generally, the minor function genes were universally expressed in most tissues at a lower level, whereas the major function homologs were predominantly expressed in the stems and hypocotyls at a much higher level. Higher expression levels of major function genes might account for their functional dominance. The differences in their expression patterns might contribute to their differential involvement in GX biosynthesis in different secondary wall-forming cell types. Gene redundancy seems very common in GX biosynthesis in Arabidopsis which is likely to ensure the correct making of xylan. However, IRX9, IRX10, IRX14 and FRA8 play major roles in GX synthesis, functional dominance of these genes may be due to their higher levels of expression or their higher activities of enzyme. In consistent with these previous results, our list of 36 strong candidates in Fig. 2a may play major roles while the the list of 19 genes in Fig. 2b may play minor roles in xylan synthesis. The difference in expression pattern may account for their differential roles in xylan formation. qua1, parvus-3/gatl1 and irx8/gaut12 mutants all display alterations in biosynthesis of both xylan and pectin (Orfila et al. 2005;Brown et al. 2007;Lee et al. 2007). This complex pleiotropic phenotype makes it difficult to define the primary functions of the three genes. Lignin content may be directly or indirectly affected when xylan biosynthesis is disturbed. It has been reported that irx7, irx8 (GAUT12) and irx9 stems which are deficient in xylan synthesis may also have lignin defects (Zhong et al. 2005;Brown et al. 2007;Peña et al. 2007). Our works together with Brown et al (2007) and Brown et al (2009)'s results revealed that irx10 stems had decreased xylan content as well as reduced amounts of lignin, which further supported that xylan and lignin synthesis are associated in some cell types and tissues. Acetyl-and methylglucuronic acid decorations of xylan affects its interaction with cellulose and lignin (Pauly and Ramirez, 2018). The xylan-modification patterns required for the interaction with cellulose is different from the patterns required for the interaction with lignin. Co-expression analysis revealed that there possibly exist specific combinations of xylan-modification genes to establish the patterns required for each interaction. For instance, GXM1 (Gh_A01G1163, Gh_D13G1237), GUX1 (Gh_D11G0603, Gh_A11G0519) and TBL3 (Gh_D13G0974, Gh_D04G1278 and Gh_A04G0787) and a new TBL-like gene (Gh_D11G3388) are likely to set up a xylan-modification pattern suitable for lignin interaction, as these genes were co-expressed with lignin biosynthesis genes (F5H (Gh_D11G1805), laccase (Gh_D04G1243); Additional file 9: Table S1).
Complementation of Arabidopsis GT mutants with corresponding cotton GTs could provide an alternative method to unravel the functional roles of cotton GTs (Zhou et al. 2006;Lee et al. 2009b). Since cotton transformation is a tedious and long-time consuming, we took advantage of the Arabidopsis mutants deficient in xylan formation to clarify the functions of cotton homologs. Whether mutation of IRX10 or FRA8, generally mutants in these genes have reduced xylan content often leading to a typical irx (irregular collapsed xylem vessels) phenotype (Zhong et al. 2005;Brown et al. 2009). Our reults showed that three fiber-specific genes (GhGT47A1, GhGT47B1 and GhGT47B2) were capable of partially rescuing the irx phenotype of respective mutant. Together with our previous results about GhGT43A and GhGT43C (Li et al. 2014), it is conceivable that other genes in the list might indeed function in fiber xylan biosynthesis. Enormous advances have been made in identifying individual genes involved in xylan synthesis. However, many questions remain about how individual proteins work together to make a functional xylan. Characterization of genes involved in xylan synthesis will help understand the complex process of fiber secondary wall formation. Our primary objective is to identify a few promising candidate genes involved in fiber xylan biosynthesis and gain a better understanding of molecular basis of fiber secondary wall synthesis. Our list of candidate genes is apparently far from being complete. We can not rule out the possibility that we have omitted some important genes involved in xylan biosynthesis. Identification of only the closest Arabidopsis homologs can not define the specific genes that are responsible for the differences in the xylan structures between stem and fiber. Our increasing knowledge of xylan biosynthetic mechanisms will provide new opportunities to manipulate the fine structures of xylan, their relative abundance and their interactions with cellulose and lignin within the wall. Understanding gene function in xylan biosynthesis will undoubtedly create new tools for the breeding of cotton with enhanced productivity, quality and biotechnological potential.

Conclusion
The results in this study indicate that a total of fifty-five genes probablely involved in the xylan biosynthesis in cotton fiber have been identified. Further complementation of Arabidopsis irx10 and fra8 mutants with corresponding cotton GTs verified that GhGT47A1, GhGT47B1 and GhGT47B2 indeed function in xylan biosynthesis. These findings provide a solid basis to uncover the biosynthesis of xylan in cotton fiber.

Supplementary Information
Additional file 1: Figure S1 Hierarchical clustering of expression profiles of 18 cotton GT43 family members in different cotton tissues.
Additional file 2: Figure S2 Hierarchical clustering of expression profiles of 60 cotton GT47 family members in different cotton tissues.
Additional file 3: Figure S3 Hierarchical clustering of expression profiles of 100 cotton TBL members in different cotton tissues.
Additional file 4: Figure S4 Hierarchical clustering of expression profiles of 9 cotton RWA genes in different cotton tissues.
Additional file 5: Figure S5 Hierarchical clustering of expression profiles of 32 cotton DUF579 family genes in different cotton tissues.
Additional file 6: Figure S6 Hierarchical clustering of expression profiles of 115 cotton GT8 family genes in different cotton tissues.
Additional file 7: Figure S7 Expression profiles of 13 putative fiber xylan synthesis-associated candidates in different cotton tissues.
Additional file 8: Figure S8 Cross sections of stems from wild type, irx10, and GhGT47A1 complemented irx10 plants.
Additional file 9: Table S1 Annotation of co-expression genes