Genome-wide identification of IDD genes
We identified total of 162 genes in 8 investigated plant species including monocots (O. sativa), dicots (A. thaliana, G. hirsutum, G. arboreum, G. raimondii, and T. cacao), ferns and moss. However, no IDD gene family member was identified in algae. Among these, 65 IDD genes were confirmed in G. hirsutum, 22 in G. arboreum, 23 in G. raimondii, 15 in T. cacao, 12 in O. sativa, 7 in moss, and 2 in fern. Higher number of IDD genes was identified in G. hirsutum than that in G. arboreum, G. raimondii, T. cacao, rice, moss, fern and Arabidopsis indicating polyploidization and duplication effect on GhIDD genes in G. hirsutum.
Phylogenetic analysis of IDD gene family
To determine the phylogenetic relationships among IDDs and explore both conserved and diversified functions of this TF family, a phylogenetic tree by ML method using MEGA 7.0 software was constructed among 162 IDD genes. To indicate the IDD genes from A. thaliana, G. arboreum, G. hirsutum, G. raimondii, O. sativa, S. moellendorffii, P. patens and T. cacao, the prefixes At, Ga, Gh, Gr, Os, Sm, Pp, and Tc were used, respectively. The phylogenetic analysis divided the 162 IDD genes into seven well distinct groups (Fig. 1). Group IDD-A contained the maximum number of IDD genes (31 genes) from all species while group IDD-B have the minimum number of IDD genes (15 genes). Groups IDD-A, IDD-B, IDD-C, IDD-D, IDD-E, and IDD-F contained IDD genes from monocot and dicot but not from moss and fern, indicating that these groups might be evolved after separation of ferns and moss from monocot and dicot plant species. Group IDD-F contained IDD genes from monocot, dicot and fern but lack moss IDD genes illustrating the divergence of these IDD genes after the division of moss from monocots, dicots and ferns. However nine IDD genes (OsIDD2, OsIDD8, OsIDD9, OsIDD11, SmIDD1, PpIDD1, PpIDD2, PpIDD3, and PpIDD4) from O. sativa, S. moellendorffii and P. patens did not fall in any group, indicating their potential special functions in the associated species evolution and development. S. moellendorffii and P. patens are resurrection plants which can tolerate extreme dehydration. O. sativa is a kind of semi-aquatic crop. All these indicated that the nine ungrouped genes may play some especial roles in the evolution from aquatic to terrestrial organisms.
Moreover, the phylogenetic tree results depicted the close relationship among cotton and cacao IDD genes, as the genes from these two species were found to be closely clustered to each other in different groups and subgroups of phylogenetic tree (Fig. 1). However, the number and distribution of IDD genes in cacao and cotton were different in all groups. For instance, in group IDD-G, 14 GhIDD genes showed a close relationship with two cacao IDD genes (TcIDD8 and TcIDD14), also supporting the hypothesis that cacao and cotton were closely related and probably derived from the same ancestors (Li et al. 2014).
To further investigate the evolutionary relationship of cotton IDD genes from G. hirsutum, G. arboreum, and G. raimondii, a phylogenetic tree within three cotton species using NJ method was generated (Fig. 2). The phylogenetic tree divided all IDD genes of three cotton species into four groups. Group IDD-b contained more IDD genes (38) while group IDD-d depicted less IDD gene family members (14). In group IDD-a, IDD-b and IDD-c, all paralogous and orthologous genes from the allotetraploid and corresponding diploid cotton clustered together. Group IDD-d exhibited 14 IDD genes only from G. hirsutum, which showed that it is far away from its two ancestor species (G. arboreum and G. raimondii) and may come from the new gene duplication and genome polyploidization, reconfirming the results that these GhIDD genes might be evolved after divergence from the common ancestors of cotton and cacao (Fig. 2).
Furthermore, to explore the evolutionary relationship and potential function catalogue among G. hirsutum IDD genes, another phylogenetic tree was constructed by NJ method. Total of 65 GhIDD genes were divided into five (IDD-a, IDD-b, IDD-c, IDD-d, and IDD-e) groups (Additional file 2: Figure S1). Group IDD-a was the biggest group with 21 GhIDD genes, however group IDD-b was the smallest with 6 GhIDD genes in it. Group IDD-c and IDD-d contained 16 and 8 genes, respectively. In group IDD-e, all (14) GhIDD genes are same with that in IDD-d of Fig. 2, which showed consistency in our analysis and strengthened the hypothesis that these IDD genes might originate from common ancestors of cotton and cacao.
Biophysical characteristics of GhIDD genes
We predicted the biophysical characteristics of all the members of GhIDD gene family in G. hirsutum. The details of biophysical properties including chromosomal position (start and end points), coding sequence (CDS), number of amino acids (protein length), molecular weight (MW), isoelectric point (pI), and grand average of hydropathicity (GRAVY) of GhIDD genes are provided in Additional file 1: Table S3.
The results indicated that GhIDD coding sequence ranged from 1 140 bp to 2 418 bp for GhIDD37 and GhIDD42, respectively. Similarly, the numbers of amino acids in the predicted protein sequences of GhIDD genes ranged from 379 to 805 for same genes. Molecular weights ranged from 41 310.77 to 89 465.69 kDa for GhIDD42 and GhIDD13, respectively. Isoelectric point of GhIDD41 was the highest (9.68) and that of GhIDD60 was the lowest of 8.37. The grand averages of hydropathicity values of all GhIDD genes were less than 0, ranging from − 0.843 to − 0.62 for GhIDD64 and GhIDD18, respectively. In addition, the predicted subcellular localization of the G. hirsutum IDD proteins were all in nuclear (Additional file 1: Table S3).
Gene structure and conserved motif analysis
To deeply understand the phylogenetic relationships, gene variation and potential protein function among G. hirsutum IDD genes, the intron–exon structure and conserved motifs analysis were performed (Fig. 3). It was observed that GhIDD genes showing similar intron–exon numbers and distribution patterns were clustered into the same group. The numbers of introns in GhIDD genes ranged from one to eight. Here, the genes with one intron accounted for 12% of the total GhIDD genes whereas only one gene (GhIDD42) had eight introns (Fig. 3a). To investigate the conserved motif distribution pattern of G. hirsutum IDD genes, another unrooted tree was constructed coupled with MEME program. The results illustrated that most of the GhIDD proteins displayed similar motifs distribution pattern, as motif 1, 2, and 3 were present in almost all proteins (Fig. 3b). Motif 6 and 10 were only present in the 14 proteins of group GhIDD-e of Additional file 2: Figure S1, however, in which motif 5, 7, 8 were absent. Moreover, motif 4 was not identified in GhIDD7, GhIDD30, GhIDD36, GhIDD38, GhIDD44, GhIDD50, and GhIDD62. Motif 9 was present in all GhIDD genes except in seven (GhIDD1, GhIDD8, GhIDD33, GhIDD50, GhIDD55, GhIDD61, and GhIDD62) proteins. In general, GhIDD genes with similar motif distribution pattern occupied the position in same group or subgroup of phylogenetic tree.
Chromosomal distribution, gene duplication and synteny analysis
The chromosomal distribution of GhIDD genes on their corresponding chromosomes (At and Dt sub-genome chromosomes of G. hirsutum) were employed. The 65 GhIDD genes were unevenly distributed on 21 chromosomes, including 30 genes on At sub-genome chromosomes, 33 genes on Dt sub-genome chromosomes while 2 genes were allotted on scaffolds (Fig. 4). The maximum numbers of genes (six genes on each) were found to be located on A12 and its orthologous chromosome D12 of At and Dt sub-genomes, respectively. We found that the distribution of genes was uneven within each chromosome, and most of the orthologues from the At and Dt sub-genomes were located on homologous chromosomes, however two orthologous genes were found on heterozygous chromosomes from the At and Dt sub-genomes. Four chromosomes out of 21 contained one GhIDD gene; seven chromosomes contained two genes; and two chromosomes contained three and six genes and three chromosomes contained four and five genes (Fig. 4). We did not found any gene on chromosome one and seven of At sub-genome as well as chromosome seven of Dt sub-genome, which showed that the gene duplications diversified from the diploid cotton species to the allotetroploid species, and these variety also result in the favorable economic characters in G. hisutum.
To study the locus relationship of orthologs/paralogous gene pairs between the At and Dt sub-genomes, we investigated the gene locus on chromosome and performed synteny analysis. The synteny analysis revealed that most of the IDD loci were highly conserved between the At and Dt sub-genomes (Fig. 5). Tandem duplication, segmental duplication, and whole-genome duplication played an important role for gene family expansion (Xu et al. 2012). To understand the expansion of GhIDD gene family in cotton genome, we performed the gene duplication analysis within and between At and Dt sub-genomes of G. hirsutum (Additional file 1: Table S4). A total of 142 duplicated gene pairs were investigated, and among them 84 orthologous gene pairs were observed as a result of whole genome duplication, whereas 54 paralogous gene pairs contributed by segmental duplication and four duplicated gene pairs depicting tandem duplication event (two each sub-genome) were observed.
According to the Darwinian theory of natural selection, we investigated the non-synonymous divergence levels (Ka) versus synonymous divergence levels (Ks) for 142 duplicated gene pairs. It is found that 125 duplicated gene pairs showed Ka/Ks value less than 0.5, while 15 duplicated gene pairs Ka/Ks value was between 0.5 and 1 (Additional file 1: Table S4). However, only two duplicated gene pairs (GhIDD15-GhIDD48 and GhIDD23-GhIDD56) showed Ka/Ks value greater than 1. From above, the Ka/Ks values of most of duplicated gene pairs were less than 1 indicating that the upland cotton IDD gene family underwent a strong purifying selection pressure with limited functional divergence. That might be occurred after segmental and whole genome duplication (WGD) event during polyploidization followed by hybridization in the evolutionary history.
Conserved amino acid residues
IDD gene family is characterized by the presence of three conserved zinc finger C2H2 domains in their protein sequence. Protein sequence alignment of Arabidopsis, rice, and upland cotton (G. hirsutum) was performed to generate sequence logos of the zinc finger C2H2 domains, so as to investigate the homologous domain sequences and the conservation of each residue in the zinc finger C2H2 domains (Fig. 6). Results illustrated that conserved amino acid residues such as C [3], C [6], H [19], H [23], C [39], C [44], H [46], H [47], C [74], C [77], H [90] and H [116] were sequentially distributed throughout the conserved domain. However, among three C2H2 domains in this conserved region, two C2H2 domains occupied their positions in N terminal while one C2H2 domain was present in C terminal of that in all observed plant species, which showed the enrichment of C2H2 domain in N terminal of conserved domain across monocots and dicots plant species. From the results of conserved amino acid residues analysis, we deduced that the amino acid residues distribution in the IDD domain was highly conserved among dicot and monocot plant species. The results also indicated that most of the IDD proteins may bear similar biochemical function and target similar elements in the downstream gene regulation.
Spatial and temporal expression pattern of GhIDD genes
Plant IDD gene family has an important role in plant growth and development such as root development (Ingkasuwan et al. 2012; Yoshida et al. 2014), sugar and starch metabolism during flower transition in maize, rice and Arabidopsis (Colasanti et al. 2006; Ingkasuwan et al. 2012; Matsubara et al. 2008; Park et al. 2008; Wong and Colasanti 2007; Wu et al. 2008). Spatiotemporal expression of transcript is tightly correlated with the biological function of a specific gene. To investigate the tissue specific expression patterns of different IDD genes, RNA-seq data were downloaded from NCBI to generate heat map. We noted that all the genes were clustered according to their expression patterns in the vegetative organs (root, stem, and leaf), reproductive organs (torus, petal, stamen, pistil, and calycle), ovule (− 3, − 1, 0, 1, 3, 5, 10, 20, 25 and 35 DPA) and fiber (5, 10, 20, and 25 DPA) (Additional file 3: Figure S2). Heat map displayed that most GhIDD genes showed a ubiquitous expression pattern in different observed tissues and minority showed much lower expression level. Only GhIDD7 and GhIDD38 displayed the specific expression in stamen.
Afterwards, qRT-PCR was performed by using root, stem, leaf, flower, 1, 3, 5, 7, 10, 15, and 20 DPA ovule tissues as well as 7, 10, 15, and 20 DPA fiber tissues to confirm the expression pattern obtained from the microarray data (Fig. 7). We selected the 12 segmentally duplicated GhIDD genes (selecting one gene from each pair of segmentally duplicated genes) exhibiting higher expression pattern in different tissues and proceeded for qRT-PCR analysis. qRT-PCR results indicated that eight genes (GhIDD2, GhIDD7, GhIDD9, GhIDD11, GhIDD15, GhIDD21, GhIDD39 and GhIDD42) represented clearly increased transcript levels in tissues of 7 DPA ovules indicating their potential roles in earlier development process of seed or fiber elongation. While GhIDD4 and GhIDD32 depicted peak transcript levels in stem tissues. GhIDD48 showed significantly increased transcript level in flowers whereas GhIDD33 had preferable expression in roots (Fig. 7). Moreover, four GhIDD genes (GhIDD9, GhIDD15, GhIDD21, and GhIDD32) displayed notable up-regulation only in vegetation and ovule tissues but not in fiber tissues, indicating that these GhIDD genes are important for vegetative as well as seed development. Overall, GhIDD2 was distinctly expressed at almost all vegetative and reproductive organs, ovule and fiber tissues, suggested that GhIDD2 may play multiple functions in growth and development. In a word, above results revealed that the GhIDD genes in cotton have experienced functional deviation, because the segmentally duplicated genes showed different expression patterns in different tissues.
Responses of GhIDD genes under various abiotic stresses
Plant often faces the various stresses such as heat, cold, drought, and high salinity which influence the plant growth and productivity. These stresses induce or repress the expression of various genes ‘effect on’ genes functions related to plant growth and development. To investigate the responses of GhIDD genes under different abiotic stresses, RNA-seq data were downloaded from NCBI and a heat map depicting different responses was constructed (Additional file 4: Figure S3). RNA-seq data revealed that all the genes were clustered according to their different responses under specific abiotic stresses, which indicated the positive and negative regulating roles of GhIDD genes under different abiotic stresses.
To verify the results of RNA-seq data, qRT-PCR analysis was performed by treating the plants with different abiotic stresses such as cold, heat, salt (NaCl) and drought (PEG). We found that the GhIDD15, GhIDD21, GhIDD32, GhIDD33, GhIDD42, and GhIDD48 were up-regulated in response to all stresses indicating that these genes might play an important positive role in abiotic stress response, while GhIDD2 might play negative role with down-regulated under all abiotic stress treatments (Fig. 8). Further, the expression levels of GhIDD4, GhIDD7, GhIDD11, and GhIDD21 were upregulated significantly in response to 6 h PEG treatment, however GhIDD39 was clearly upregulated after 1 h treatment of PEG.