Identification and physiochemical traits of MAPK gene family
A total of 74 MAPK genes were detected in cotton; out of them, 41 were in G. hirsutum, 19 in G. raimondii, whereas 14 were in G. arboreum. These genes' proteins were studied further to evaluate their physicochemical characteristics as well as other features. These genes were reported to express proteins ranging in length between 87 and 628 amino acids, with molecular mass between 9.988 and 71.503 KDa and isoelectric points (pI) between 5.672 and 10.23. All identified MAPK genes have grand average hydropathy > 0, implying that all MAPK genes in cotton were hydrophilic. Based on the WOLF PSORT (https://wolfpsort.hgc.jp) analysis result, GHMAPK proteins are localized across different parts of the cell including the nucleus, cytoplasm, mitochondria, and chloroplast. The proteins were predominantly located on the cytoplasm (Additional file 1: Table S1).
Phylogenetic analysis of MAPK proteins
A phylogenetic tree was constructed based on multiple sequence alignment of 41 GHMAPK, 14 GaMAPK protein sequences, 19 GrMAPK protein sequences, and 18 Arabidopsis MAPK protein sequences revealing the evolutionary relationship of cotton MAPKs. The cotton MAPK family is categorized into A, B, C, D, and E clades based on results from the phylogenetic tree (Fig. 1). Clade A comprises of all the genes with TDY phosphorylation site, clade B members contain TEY motif, Clade C members contain TDY motif, clade D members contain more TDY motif and few TEY motifs, and Clade E members contain TEY motif. There are 26 GHMAPKs, 9 GrMAPKS, 9 GaMAPKs, and 3 AtMAPK that contain the TDY motif, while 15 GhMAPK, 8 GrMAPK, 6 GaMAPK, and 5 AtMAPK contain the TEY motifs. These show that the cotton MAPK genes have more TDY motifs and minor TEY motifs while the AtMPK genes have more TEY motifs and minor TDY motifs. The cotton MAKS classification was consistent with previous findings (Zhang et al. 2014b). This indicates that TEY MAPKs motif may significantly function in dicot plants than TDY MAPKs motif (Fig. 1).
Collinearity analysis was performed between the physical maps of the subgenomes about G. hirsutum, G. raimondii, and G. arboreum for associations among A vs. D, A vs. At, and D Vs. Dt subgenome (Fig. 2). Generally, most of the MAPK genes from tetraploid cotton represented high similarities to the D genome (G. raimondii) and the A genome (G. arboreum).
Motif identification and gene structure analysis of MAPK
Investigations for conserved motif about MAPK proteins were carried out through MEME 41 GHMAPKS, GaMAPKS, and 19 GrMAPKs putative protein sequence was submitted to search for conserved motifs (Fig. 3). As shown in the figure, GHMAPKs, GrMAPKS, and GaMAPKS both possess 20 motifs. In GHMAPKS, the majority of the identified genes contains 10 similar motif composition (1, 2, 3, 4, 5, 6, 10, 13, 17, and 18), few among them contains 9 motifs (7, 8, 9, 11, 12, 14, 15, 16, and 19) and only 3 genes; GH_A11G0035, GH_A12G2888, GH_D11G0040, and GH_D12G2911 contains motif 20. While in GaMAPK, the majority of identified gene contains 9 similar motif composition (1, 2, 3, 4, 6, 8, 9, 10, and 15), few among them contains 5 motifs (5, 7, 16, 17, and 18) only 3 among the genes; Ga01G0448, Ga05G3624, Ga11G0522 contain motif 19, and three among the genes; Ga01G2598, Ga12G0481, and Ga03G1217 contains motif 20. Lastly, GrMPK genes contains 9 similar motif composition (1, 2, 3, 4, 5, 6, 8, 9, 15), few among them contain 10 motifs (7, 10, 11, 12, 13, 16, 17, 18, 19, and 20) and only few among the genes; Gorai.008G065400, Gorai.011G100600 and Gorai.003G012800 contain motif 14.
Gene structure was examined by using an online tool (http://gsds.gao-lab.org/) for cotton MAPKKKs. In G. hirsutum, out of 41 genes, 38 were found to possess an intron, and 3 are intronless. The most extended intron length was observed in GH_A07G1527 (Fig. 4). In G. arboreum, all the 19 genes possess an intron, and the highest intron length was observed in Ga02G0944 (Fig. 4B). In G. raimondii, both genes possess an intron, and the highest intron length is found in Gorai.006G007700 (Fig. 4C).
Chromosomal mapping of MAPK genes in Gossypium species
The chromosome distribution was investigated in the 3 Gossypium species and found that among the 41 G. hirsutum MAPK genes, 20 were located in the At-sub genome, and 21 were located in the Dt subgenome. Chromosome A12 and its homologous D12 had the highest gene locus. Three genes were found in A01, A03, A11, D03, and D05, D07, D01 and D02 have two genes each, the remaining A02, A05, A07, A08, A09, A10, D04, D08, D09, and D10 have one gene, respectively (Fig. 5A). In G. arboreum, the GrMPK genes were mapped on chromosome A01, A02, A03,A05, A11, and A12. The highest gene locus was observed in chromosomes A03, A11, and A12, each with three genes (Fig. 5B). While chromosomes A01, A02, and A05 possess two genes each. The genes in G. raimondii are distributed among D01, D02, D03, D05, D06, D07, D08, D09, and D11 chromosomes. Chromosome D08 has the most gene loci, followed by D03, three genes (Fig. 5C). While Chromosomes D01, D02,D05, D09 have two chromosomes each, one chromosome was observed in D06 and D11 (Fig. 5C).
Determination of cis-regulatory elements
Cis-regulatory elements are assumed to perform various functions based on their arrangement and location across promoters (Biłas 2016). We have analyzed cotton MAPKs promoter regions for the determination of their cis-elements. The 2 kb sequences from the start of transcription from each GHMKPS on the upstream side were considered and used. The cis-elements identified in the GHMAPK promoter region were categorized into five functions: hormone responsiveness, stress responsiveness, light responsiveness, cellular responsiveness, and binding site (Additional file 2: Table S2). The majority of GHMAPK genes represented ABRE for the responsiveness of hormone, i.e. cis-element, are involved for elements related to the ABA responsiveness, TGA element (auxin-responsive element), ethylene responsive element (ERE), CA element (salicylic acid-responsive gene element), and GATA-Motif (cis-acting regulatory element involved in the MeJA-responsiveness). Few among the GHMAPK promoters have the GARE motif and P-BOX (gibberellin-responsive). Elements associated with stress include TC-rich repeats (defense and stress-responsive element), LTR (low temperature-responsive element), WUN-motif (wound responsive element), and TC-rich repeats (defense and stress responsiveness element). For light-responsiveness, elements such as ATCT-motif, Box 4, GT1-motif, LAMP-element, TCT-motif, TCCC-motif, CHS-CMA1a, TCT-motif, and TGACG-motif are found. For cellular development and binding site, few cis-elements are involved. Several cis-regulatory elements related to enhancing tolerance to abiotic and biotic stresses in plants have been found in the promoter sequences in the coding sequence related to the GHMAPK gene, suggesting that this gene could be used as an abiotic tolerant gene in cotton (Fig. 6).
RNA sequence analysis and RT-qPCR confirmation of GHMAPK under drought and salt treatments
The RNA profile data of TM-1 was used, and the raw data and their transformed log 10 values of the genes were analyzed, and a heat map was constructed. Seven GHMAPK genes were found to have different expression patterns across different tissues, including young leaf, true leaf, the cotyledon, stem, fiber, and roots. Gene-specific RT-qPCR primers were designed (Additional file 3: Table S3). Based on RNA-sequence data as well as RT-qPCR results, we determined that GH_A07G1527 and GH_D02G1138 are upregulated in all tissues. GH_D03G0121 shows upregulation in cotyledon, stem, and root, GH_D03G1517 and GH_D05G1003 are upregulated in all the tissues. GH_D11G0040 gene is upregulated in stem and root, while GH_D12G2528 shows a relatively low expression (Fig. 7A).
To determine the roles of GHMAPK genes under salt and drought stresses, the seven genes have been analyzed for both the RNA-sequence data and RT-qPCR results to detect their expression pattern after treatments. For drought treatment, GH_A07G1527 was upregulated at 6 h and 12 h, GH_D02G1138 was upregulated at 2 h, 6 h, and 12 h, and GH_D03G0121 was upregulated at 12 h. At the same time, GH_ D03G1517 and GH_D05G1003 were upregulated at 2 h, 6 h, and 12 h. While GH_D11G0040 showed no expression. Lastly, GH_D12G2528 was upregulated at 12 h (Fig. 7B). For salt treatment, GH_A07G1527 and GH_ D02G1138 were upregulated at 2 h, 6 h, and 12 h, GH_D03G0121 show no expression in RNA seq data and upregulated expression at 12 h post-treatment. GH_ D03G1517 was upregulated at 2 h, 6 h, and 12 h of post-treatment, respectively. GH_D11G0040 showed low expression in RNA seq data and upregulated at 6 h and 12 h of RT-qPCR. Lastly, GH_D12G2528 was upregulated at 2 h, 6 h, and 12 h, respectively (Fig. 7C). The results demonstrated that RNA sequence analysis and RT-qPCR expression results represented a strong correlation, with R2 = 0.91 in drought, R2 = 0.75 in salt, and R2 = 0.66 in tissues.