Identification and expression analysis of Tubulin gene family in upland cotton

Cotton fibers are single-celled extensions of the seed epidermis, a model tissue for studying cytoskeleton. Tubulin genes play a critical role in synthesizing the microtubules (MT) as a core element of the cytoskeleton. However, there is a lack of studies concerning the systematic characterization of the tubulin gene family in cotton. Therefore, the identification and portrayal of G. hirsutum tubulin genes can provide key targets for molecular manipulation in cotton breeding. In this study, we investigated all tubulin genes from different plant species and identified 98 tubulin genes in G. hirsutum. Phylogenetic analysis showed that tubulin family genes were classified into three subfamilies. The protein motifs and gene structure of α-, β-tubulin genes are more conserved compared with γ-tubulin genes. Most tubulin genes are located at the proximate ends of the chromosomes. Spatiotemporal expression pattern by transcriptome and qRT-PCR analysis revealed that 12 α-tubulin and 7 β-tubulin genes are specifically expressed during different fiber development stages. However, Gh.A03G027200, Gh.D03G169300, and Gh.A11G258900 had differential expression patterns at distinct stages of fiber development in varieties J02508 and ZRI015. In this study, the evolutionary analysis showed that the tubulin genes were divided into three clades. The genetic structures and molecular functions were highly conserved in different plants. Three candidate genes, Gh.A03G027200, Gh.D03G169300, and Gh.A11G258900 may play a key role during fiber development complementing fiber length and strength.


Introduction
The production of upland cotton (G. hirsutum) accounts for a significant proportion of global cotton production. Although upland cotton is a highly yielding fiber source, it is pertinent to improve the fiber quality to meet the industrial demand (Su et al. 2018). Therefore, current cotton breeding programs are also focused on fiber quality improvement in upland cotton. Wild progenitors are sources carrying excellent genetic characteristics for developing desirable variation in plants (Nazir et al. 2020). It is pertinent to explore genetic variation among different species of Gossypium and further to utilize in breeding programs.
Fiber length and strength are essential indicators in assessing fiber quality (Gao et al. 2019). The development of cotton fiber can be delineated into four distinct overlapping stages: fiber initiation, elongation, secondary wall synthesis, and maturation. It is a highly revised, basic biological process (Gao et al. 2007;Wang et al. 2017). Microtubules (MTs) are the cytoskeleton's core element and act a pivotal function in cellular migration, mitosis, mechanical stress, cell polarity, intracellular transport, cell division, and cell morphogenesis (Nieuwenhuis and Brummelkamp 2019). Many studies have shown that the actin and tubulin genes play a critical role in the cytoskeleton synthesis and cotton fiber elongation (Pydiura et al. 2019;Li et al. 2005). Tubulin genes which are highly conserved in structure and function from alpha, beta, and gamma-tubulin families were usually used as reference genes for qRT-PCR (Findeisen et al. 2014;Zhao et al. 2014;Jayaswal et al. 2019;Ray and Johnson 2014). Most tubulin genes have different configurations merely due to few amino acid substitutions, which have two conserved domains Tubulin and Tubulin_C (Mubeen et al. 2016). Superfamilies or subfamilies of Tubulin genes were already reported in some species, such as flax (Linum usitatissimum) (Gavazzi et al. 2017;Pydiura et al. 2019), fungal (Zhao et al. 2014), Salix arbutifolia (Rao et al. 2016).
Microtubules are chiefly built-up heterodimers of αtubulin and β-tubulin, which are closely related to cellulose and microfiber deposition and also play a vital role in the secondary cell wall development in all plants (Seaguii 1992). At the same time, γ-tubulin may only mediate microtubule nucleation. The reduction of α-tubulin can disrupt the microtubule structure and further result in deviant cell expansion (Rao et al. 2016). The tubulin proteins can be easily purified from cell suspension cultures of tobacco and Arabidopsis (Hotta et al. 2016), which may intuitively study its structure and function for laying a good foundation for in vitro research. Individual tubulin family members showed organizational differences in various organisms (Rao et al. 2016). The γ-tubulin gene GCP6 is a key for spindle morphogenesis but not necessary for microtubule reorganization in Arabidopsis (Miao et al. 2019). TUA1 had specific expression in leaves in poplars but abnormally low in xylem, which indicated that high expression levels of tubulin protein were not tolerated in wood-forming tissues. As in leaves, the relative transcriptional abundance of the TUB is lower than that of the TUA in transgenic poplars (Swamy et al. 2015). Interfering with tubulin proteins affects behavioral changes in guard cells, to delay stomatal closure under stress and light-induced leaf stomatal opening (Swamy et al. 2015), which emphasized tubulin involvement in non-cellulosic polysaccharides assembly during cell wall biogenesis (Swamy et al. 2015).
Although the significant role of the tubulin gene family has been reported previously in other species, it is less understood in cotton. Our study systematically explored the tubulin gene family and further characterized phylogeny relationships, gene structures, chromosomal locations, expression patterns in cotton. The yielded information can be further utilized as a reference for fiber length and strength.

Plant materials and nucleic acid extraction
Two upland cotton varieties (J02508 and ZRI015), with significant differences in cotton fiber length and strength, were grown in the field station of Institute of Cotton Research (CAAS), Anyang, China. Cotton fibers at 0, 1, 3, 5, 10, 15, 20, 25 days to post-anthesis (DPA), and root, stem, and leaves were collected to extract RNA with three biological replicates. All samples were kept in liquid nitrogen for transportation and stored at − 80°C. Total RNA was extracted using the RNAprep Pure Plant Plus Kit (TIAN GEN). Electrophoresis and spectrophotometric detection were adopted to detect the quality and quantity of the nucleic acids. Then, using the RNA as the template, every sample contained 1 μg RNA. The cDNA was then reversetranscribed using a First Strand cDNA Synthesis Kit (TAKARA).
Sequence retrieval and phylogenetic analysis G. hirsutum genome sequence was obtained from the CottonGen (https://www.cottongen.org). To identify tubulin genes in the genome of cotton, Arabidopsis, flax, and rice tubulin gene families were identified through a genomewide sequence and downloaded from Phytozome (https://phytozome.jgi.doe.gov/pz/portal.html) (Swarbreck et al. 2008;Pydiura et al. 2019). BLASTP with default parameters was used to identify the tubulin proteins in cotton with Arabidopsis and flax tubulin sequences as the queries based on homology search. The selected cotton tubulin proteins were used for further identification by searching the cotton database again. These identified sequence domains were performed to make clear whether all protein sequences contain tubulin-conserved domains using InterProScan (http://www.ebi.ac.uk/Tools/pfa/ iprscan). Subsequently, HMMER software with default parameters and conserved tubulin domains was used to search for the above sequences. Using the same method, we obtained other three predicted cotton tubulin genes from G. barbadense, G. arboreum, and G. raimondii. Multiple sequence alignment of all identified tubulin proteins was performed using ClustalX with default parameters (Thompson et al. 1997). Phylogenetic trees were constructed using bootstrap Neighbor-Joining algorithm with bootstrap analysis of 1 000 replicates in MEGA7 v7.0.26 (Kumar et al. 2016).

Gene structure, chromosomal mapping and collinearity analysis
The Gene Structure Display Server Program (http://gsds. cbi.pku.edu.cn/) was employed to derive exon-intron structure representations based on basic coding information as described above (Ali et al. 2019). All motifs were identified by MEME software (http://meme-suite.org/). Relational gene chromosomal position information for all G. hirsutum tubulin sequences was obtained from annotation files downloaded from the CottonGen website and visualized by tool CIRCOS to illustrate the chromosomal distribution. CIRCOS was used to draw the distribution (Zou et al. 2013). The collinearity pairs of the tubulin family were mapped adopting CIRCOS software (Krzywinski et al. 2009).

Gene expression analysis
The RNA-seq data of five distinct tissues from two upland cotton varieties J02508 and ZRI015 were already uploaded to the NCBI Gene Expression repository (PRJNA634606). The reference genome sequences for G. hirsutum were downloaded (Yang et al. 2019). The FPKM values of tubulin genes were calculated using the Cufflinks program. The gene expression data (FPKM) were divided by the mean of all values, then were normalized with the log 2 (FPKM+ 1) method to calculate the expression levels. Gene expression patterns between different fiber developments were visualized with heat maps using the TBtools software package (Chen et al. 2020).
The gene-specific primers used for qRT-PCR were designed with a primer database (http://biodb.swu.edu.cn/ qprimerdb). All primers are listed in Table S1. We used the GhUBQ (GhA10G005800) gene as an internal control, and a total volume of 20 μL that contained 0.4 μL of each primer (10 μmol·L − 1 ), 2 μL cDNA, 10 μL SYBR Premix Ex Taq (2×), and ddH 2 O to make up the volume used to perform qRT-PCR with three biological and technological replicates on ABI 7500 fast real-time PCR system using SYBR Premix Ex Taq (TAKARA). The gene relative expression levels were calculated using the 2 −△△C T method.

Result
Phylogenetic analysis of the tubulin protein family To investigate the evolutionary relationships of tubulin genes, the tubulin protein sequences from O. sativa, Arabidopsis, L. usitatissimum, G. arboreum, G. barbadense, G. raimondii, and G. hirsutum were used to generate an unrooted phylogenetic tree using various sequence alignments (Fig. 1). The tubulin proteins were mainly classified into three groups, viz., α-, β-, and γtubulins, which revealed similar results to previous studies concerning different plants (Jayaswal et al. 2019).
Most of the orthologous genes between the allotetraploid and the corresponding diploid cotton were clustered into the same clade based on phylogenetic analysis with tubulin protein sequences, which indicated that the tubulin genes had been differentiated into three different subfamilies. In each group, most tubulin proteins from the four cotton species were relatively farther than tubulin proteins from O. sativa, Arabidopsis, L. usitatissimum, but a few had a particular case, including some genes in O. sativa, which were closely related to cotton species. γ-tubulins were abundant in O. sativa, which may play a vital role in the development of rice. However, γ-tubulins were scarce in Arabidopsis, and only two genes, viz., AT.5G05620 and AT.3G61650 were from this group. Interestingly, the αand β-tubulins subfamilies are expended, but the γ-tubulin subfamily is relatively minor in cotton. The above results depicted that tubulin genes are either lost or expanded during evolution, according to plant growth needs.

Gene structure and conserved motifs analysis
Exon-intron distribution is generally associated with biological functions and is helpful to understand the evolutionary relationship among different gene families. Therefore, we analyzed the conserved motif and structural characterizations to investigate the phylogenetic relationships among different members of the tubulin family of G. hirsutum (Fig. 2). Members of α-, βtubulin genes are quite conserved. However, γ-tubulin genes were not retained in protein motifs and gene structure analysis. The exon/intron structural variation of the tubulin gene family was then compared, yielding a comprehensive illustration of their relative lengths. We found a variable structural pattern of exon-intron in tubulin family genes. Closely related genes (sharing the same phylogenetic branch) depicted similar structural patterns; however, a considerable distinction was identified among genes from different branches. The majority of α-tubulin subfamily genes were identified with four introns, while the β-tubulin subfamily consists of three introns. In contrast, five to eleven introns were found in the γ-tubulin subfamily. Further, we identified ten conserved motifs utilizing MEME program. There are almost all conserved sites in α-, β-tubulin subfamilies. However, the genes Gh.D05G257500, Gh.D12G088700 and Gh.D04G181300 may have lost one or two motifs 4/5/8 in α-, β-tubulin subfamily. Most γ-tubulin genes are poorly conserved compared with the members of α-, β-tubulin genes. Surprisingly, Gh.A11G102000, Gh.D11G102600, and Gh.A12G127300 contain ten conserved motifs in the γtubulin subfamily. The above results showed that most members from the same subfamily have similar motif features and exon-intron structure, supporting close evolutionary relationships. Fig. 2 Phylogenetic relationship, conserved motif, and gene structural characterizations analysis of tubulin genes in G. hirsutum. A neighborjoining phylogenetic tree was created using the MEGA7 program, conversed protein motifs, and gene structure analysis As depicted in Table S2 and Fig. S1, the results showed that the most α-, β-tubulin proteins have conserved domains with Tubulin and Tubulin_C; a theoretical isoelectric point of the most tubulin proteins was weakly acidic; however, when the proteins lose Tubulin_ C, the pIs of proteins were alkaline. Conserved site analysis of tubulin gene family suggested that some amino acids are very conserved (Fig. S2). The BaCelLo (Pierleoni et al. 2006), EpiLoc (http://epiloc.cs.queensu.ca/) and Plant-mPLoc (Chou and Shen 2010) predicted that the subcellular localization of all tubulin proteins is Cytoplasmic/Chloroplast. TMHMM2.0 (http://www.cbs. dtu.dk/services/TMHMM/) and TOPCONS (Tsirigos et al. 2015) predicted that all tubulin proteins are located outside the membrane. These results indicated that the tubulin genes might synthesize the cytoskeleton in the plant cell cytoplasm.

Chromosomal location, gene duplication, and collinearity relationships
We constructed chromosomal distribution maps of tubulin genes to understand the genomic distribution and structural variation of the tubulin gene family in Gossypium species. An uneven distribution pattern of these genes, on sub-genomic (A sub-genome and D subgenome) and chromosomal level (24 chromosomes) was observed, mainly distributed towards proximate ends of the chromosomes except for At10 and Dt10, which showed nonexistence of tubulin genes (Fig. 3). The highest number of tubulin genes (7) were present on At03 and Dt05. In contrast, At02, At04, At06, Dt04, Dt07, and Dt12 of G. hirsutum included only one or two tubulin genes. Based on the uneven distribution pattern on subgenomes, we speculated that this variation might have resulted from the evolutionary process.
We further analyzed the collinearity relationships for all tubulin genes on the chromosomes At and Dt of G. hirsutum with other Gossypium species. Among 42 tubulin genes on the chromosomes At of G. hirsutum, 33 had intergenomic homologous genes on chromosomes Dt of G. hirsutum (Fig. 4); 40 tubulin homologous genes in G. arboretum (Fig. S3); all genes on the chromosomes Dt of G. hirsutum had intergenomic homologous genes in G. raimondii (Fig. S4). Among all the 91 tubulin genes of G. hirsutum, 77 genes tubulin had intergenomic homologous genes in G. barbadense (Fig. S5), 91 tubulin homologous genes in G. darwinii and G. tomentosum (Fig. S6/7), respectively. The above results emphasized an independent evolutionary process for tubulin gene quantity in At subgroup and Dt subgroup after the formation of allotetraploid, while afterward same directional evolution for different allotetraploid cotton species.

Expression patterns of tubulin genes in different upland cotton
Although most tubulin superfamily genes have redundant functionality, they have similar functions to promote cell elongation (Segami et al. 2012;Blume et al. 2013). The expression patterns analyses were performed to understand the role of tubulin genes during fiber elongation and development. Firstly, RNA-sequencing data were used to detect the expression patterns of 98 tubulin genes in root, stem, leaf, ovule, and fiber tissues of G. hirsutum (J02-508 and ZRI-015) using a heatmap. The results showed that most α-, β-tubulin genes had higher expression levels relative to γ-tubulin genes in all tissues, and the expression levels of most α-, β-tubulin gene subfamilies were very obvious tissue specificity of fiber development stages (Fig. 5). Secondly, the expression levels of tubulin genes in the fiber development stage had higher levels than other tissues, e.g. Gh.A08G206600, Gh.D08G202200, Gh.A11G351700, Gh.D11G355300, Gh.A08G207900, Gh.A03G027200, Gh.D03G169300, Gh.A11G082000, Gh.D11G082500, Gh.A13G236900, Gh.A11G258900, Gh.A05G199500, Gh.D05G216200, Gh.D02G226100, Gh.A03G209100, Gh.A04G142400, Gh.D04G181300, Gh.A02G095500 and Gh.D02G097200, which may contribute to fiber development.
Among them, only Gh.A03G027200, Gh.D03G169300, and Gh.A11G258900 had differential expression patterns at distinct stages of fiber development in the two varieties, and they were significantly different at 15 DPA, which is a critical time for fiber length and strength in J02-508 compared with ZRI-015. The above results indicated the three genes might inhibit fiber length and strength.
The expression levels of Gh.A03G027200, Gh.D03 G169300, and Gh.A11G258900 in different fiber development stages for J02508 and ZRI015 were detected using qRT-PCR. As shown in Fig. 6, the relative expression data revealed that the expression levels of Gh.A03G027200 and Gh.D03G169300 were higher from 10 to 25 DPA in J02508 and ZRI015, and the expression levels of Gh.A11G258900 from 3 to 15 DPA in ZRI015 was twice as much as that of J02508.

Discussion
Cotton fibers are single-celled trichomes, which are the essential textile raw material. The tubulin proteins belong to microtubule proteins, which play an important role in synthesizing the cytoskeleton (He et al. 2008). Microtubules of all eukaryotic cells are formed by αand β-tubulin heterodimers (Schwarzerová et al. 2019). Microtubules are the core element of the cytoskeleton and act a pivotal function in cellular migration, mitosis, mechanical stress, cell polarity, intracellular transport, cell division, and cell morphogenesis (He et al. 2008;Nieuwenhuis and Brummelkamp 2019;Meiring et al. 2020). Based on the genome-wide characterization of the tubulin gene family, this study determined 40 tubulin genes in A genome, 43 tubulin genes in D genome, 84 tubulin genes in AD 2 , and 98 tubulin genes in AD 1 genome. The results showed that the loss of tubulin genes did not happen in the allotetraploid G. hirsutum and G. barbadense, and some tubulin genes had been extended in G. hirsutum. The number of genes in the allotetraploid cotton is twice as high as that of diploid cotton, contrasting with high rates of gene loss in Fig. 4 The chromosome distribution and the synteny of tubulin genes in G. hirsutum the allotetraploid plants (Zhang et al. 2015). Based on the previous report about tubulin families in different plants (Pydiura et al. 2019), tubulin genes had been classified into 3 subfamilies, viz., α-, βand γ-tubulins, our results were consistent with this report (Pydiura et al. 2019). The previous reports suggested that the structure and function of α-, β-tubulins were quite similar but the γ-tubulins had significant differences (Pydiura et al. 2019;Rao et al. 2016). In flax, expression analysis of tubulin genes showed that specific β-tubulin members could uphold various microtubule functions such as cell elongation, pollen tube development, and cell wall thickening (Gavazzi et al. 2017;Rao et al. 2016). The amino acid sequences of eukaryotic tubulins are well conserved but their function still display considerable differences (Hotta et al. 2016). The difference in the number of conserved motifs and exons of genes showed that the functional diversity of tubulin might closely related to evolution. Furthermore, we identified tubulin gene family members that play critical roles in fiber development.
To date, few tubulin genes have been functionally characterized in plants, especially in fiber-rich cotton. Combining the transcriptome expression of tubulin genes in J02508 and ZRI015, we find that some α-, β-tubulins were explicitly expressed in fiber development stages, while the γtubulin genes showed low expression levels in all tissues. Previous researches showed that some TUB genes were highly expressed at 10-DPA in cotton (Li et al. 2002;He Fig. 5 Expression analysis of tubulin family genes in different tissues. A neighbor-joining phylogenetic tree was created using the MEGA7 program and expression level in different tissues of J02-505 and ZRI-015 et al. 2008). Some α-tubulins had a specific expression model for developing cotton fibers (Whittaker and Triplett 1999). Interestingly, Gh.A03G027200 and Gh.D03G169300 (both belong to β-tubulins), Gh.A11G258900 (belongs to αtubulins) had significant variable expression profiles during fiber development stages in J02508 and ZRI015. Significant differential expressions were observed at 15-DPA, a critical stage for fiber development (Liu et al. 2006). We further speculated that the expression of these genes might inhibit fiber length and strength during fiber development. Both Gh.A03G027200 and Gh.D03G169300 are homologous genes located in At and Dt of G. hirsutum, respectively, which showed that they perform the same function together. However, the expression level of the homologous gene of Gh.A11G258900 was very low in all tissues of J02508 and ZRI015.

Conclusion
Tubulin genes family plays a key role in cotton fiber development. Therefore, following a comprehensive approach including the phylogenetic tree, chromosomal location, collinearity, gene structure, and expression profile, genome-wide characterization of the tubulin gene family were performed. The tubulin family genes were classified into three clades in the phylogenetic tree. The α-, β-tubulin genes are highly conserved; however, γtubulin genes are less conservative among cotton and other plant species. The high expression levels of some αand β-tubulin genes emphasized that the tubulin family plays a significant role in the cotton fiber development stage. The Gh.A03G027200, Gh.D03G169300, and Gh.A11G258900 had significant variation at a critical time for fiber length and strength in J02-508 compared with ZRI-015, indicating that three genes may inhibit fiber length and strength during fiber development. Our study's results built the foundation for excavating important functional genes and further studying the fiber length and strength mechanism of cotton.