Skip to main content

Genome-wide identification and expression profiling of photosystem II (PsbX) gene family in upland cotton (Gossypium hirsutum L)



Photosystem II (PSII) constitutes an intricate  assembly of protein pigments, featuring extrinsic and intrinsic polypeptides within the photosynthetic membrane. The low-molecular-weight transmembrane protein PsbX has been identified in PSII, which is associated with the oxygen-evolving complex. The expression of PsbX gene protein is regulated by light. PsbX's central role involves the regulation of PSII, facilitating the binding of quinone molecules to the Qb (PsbA) site, and it additionally plays a crucial role in optimizing the efficiency of photosynthesis. Despite these insights, a comprehensive understanding of the PsbX gene's functions has remained elusive.


In this study, we identified ten PsbX genes in Gossypium hirsutum L. The phylogenetic analysis results showed that 40 genes from nine species were classified into one clade. The resulting sequence logos exhibited substantial conservation across the N and C terminals at multiple sites among all Gossypium species. Furthermore, the orthologous/paralogous, Ka/Ks ratio revealed that cotton PsbX genes subjected to positive as well as purifying selection pressure might lead to limited divergence, which resulted in the whole genome and segmental duplication. The expression patterns of GhPsbX genes exhibited variations across specific tissues, as indicated by the analysis. Moreover, the expression of GhPsbX genes could potentially be regulated in response to salt, intense light, and drought stresses. Therefore, GhPsbX genes may play a significant role in the modulation of photosynthesis under adverse abiotic conditions.


We examined the structure and function of PsbX gene family very first by using comparative genomics and systems biology approaches in cotton. It seems that PsbX gene family plays a vital role during the growth and development of cotton under stress conditions. Collectively, the results of this study provide basic information to unveil the molecular and physiological function of PsbX genes of cotton plants.


Cotton serves as a vital cash crop, primarily cultivated for its fibers, while also offering the potential for extracting edible oil from its seeds (Raza et al. 2021; Zafar et al. 2002a; Chen et al. 2007). The upland cotton (Gossypium hirsutum L.) dominates global cotton production, constituting approximately 95% of the total output due to its superior fiber quality, high yield, and wide adaptability traits (Zhang et al. 2008; Song et al. 2018). The prime objective of the cotton breeding programs is to enhance both fiber quality and yield simultaneously (Zahid et al. 2016). Cotton, a multi-purpose crop is highly vulnerable to biotic and abiotic stresses (Zafar et al. 2002b). A significant reduction has been witnessed in the production of cotton due to abiotic stresses such as salinity and drought (Zafar et al. 2002bc). In the midst of the twenty-first century, the demand for food is on the rise, driven by a rapid increase in the global population (Beddington et al. 2012). However, the challenges posed by climate change, notably elevated temperatures (Zafar et al. 2022a), coupled with the severity of abiotic stressors, disrupted the growth and development of crops, led to a significant decline in cotton yields (Nouri et al. 2015; Sasi et al. 2018).

Furthermore, plants, being sessile in nature, are more exposed to the adverse effects of abiotic stresses on both growth and development as opposed to other organisms (He et al. 2018; Magwanga et al. 2018; Xu et al. 2019). Major abiotic stresses such as heat, salinity, and drought result in the over-reduction of the electron transport chain (ETC), leading to photo-oxidation (Nishiyama and Murata 2014). Globally, agricultural productivity has faced a decline due to the combined impact of drought and salinity (Zhu 2001; Dong 2012).

Moreover, salinity, intense sunlight, drought, and heat stress have the combined effect of diminishing the CO2 assimilation rate, causing an increase in reactive oxygen species (ROS) (Pintó-Marijuan and Munné-Bosch 2014). Previous studies have shown that abiotic stresses account for a 50% loss in yield (Nath et al. 2013). Similarly, the reduced activity of photosynthesis is a major contributor to significant yield losses in crops (Nouri et al. 2015). Drought and salinity disrupt plant growth by impacting various biochemical and physiological attributes, such as chlorophyll production, photosynthetic rate, nutrient metabolism, ion uptake and translocation, carbohydrate metabolism, and respiration (Hussain et al. 2018). Plant genes affecting the photosynthesis process, including ribulose bisphosphate carboxylase large chain (RBCL) (Berry et al. 2013) light-harvesting chlorophyll a/b-binding protein (LHC) (Zhao et al. 2020), and cytochrome P450, have been identified (Magwanga et al. 2019). Furthermore, genes encoding structural proteins involved in photosystem II (PsbA, PsbE, PsbF, PsbH, PsaN, and PsbX) have been identified (Zhang et al. 2020). Plastoquinone reductase, an enzyme integral to the photosystem II (PSII), oxidizes water in the presence of light. A small hydrophobic protein named ‘PSII-X (PsbX)’ having a molecular weight of 4.1 kDa is generally found in the core complex of PSII in both plants and cyanobacteria (Shi et al. 1999; Katoh and Ikeuchi 2001). The PsbX protein serves as a key regulator of the PSII and aids in binding the quinone molecules with the Qb (PsbA) site. Moreover, PsbX also regulates the efficiency of photosynthesis as well as biomass accumulation. While several aspects of PSII structure remain unknown, the function of the PsbX gene family under abiotic stresses in cotton has yet to be elucidated. However, the sequencing of whole genome information of G. raimondii (Wang et al. 2012), G. arboreum (Huang et al. 2020), and G. hirsutum (Hu et al. 2019) has provided the framework to conduct the functional analysis of protein related to PsbX genes in three cotton species.

In the present study, we performed an in-sillico characterization of PsbX gene family members in G. arboreum, G. hirsutum, G. barbadense, and G. raimondii to unveil the functional significance of PsbX gene family. Subsequently, we conducted a comprehensive array of bioinformatics analyses, encompassing multi-sequence alignment, gene loci examination, analysis of gene structures, scrutiny of promoter cis-elements, examination of conserved protein motifs, assessment of phylogenetic relationships, and exploration of gene expression profiles. The outcomes of our study may offer valuable insights for a more in-depth characterization of the PsbX gene in cotton. Additionally, our investigation provides an avenue for understanding the molecular foundation of the regulatory effects of PsbX across diverse developmental stages and sheds light on the response mechanisms of cotton plants under stress conditions.

Material and methods

Identification of the PsbX gene family in Gossypium Spp.

The genomes of G. hirsutum v1.1 and G. barbadense were obtained from the Group of Cotton Genetic Improvement (GCGI), Huazhong Agricultural University (HZAU) ( Sequencing data of Gossypium raimondii was obtained from and G. arboreum was downloaded from Cottonfgd ( Genomes of Arabidopsis thaliana Araport 11, Populus trichocarpa, O. sativa, and Vitis Vinefera were obtained from Phytozome v13. The genome sequence data of Theobroma cacao v2 was downloaded from Ensemble Plants ( for computational analysis. The Hidden Markov Model profile for PsbX (PF06596) was downloaded from PFAM ( (Finn et al. 2016). HMMER 3.2.1 was used to identify Psbx1 protein sequences against all the protein sequences of the above genomes. Protein sequences with an E-value less than 1e−5 were selected for further analysis. In addition, to approve the protein sequences obtained from cotton and other genomes PsbX genes, candidate protein sequences were confirmed by using the CD-search (

Sequence alignment and phylogenetic tree construction

All the protein sequences of PsbX confirmed by SMART ( (Schultz et al. 2000) and The Conserved Domain Database (CDD) were aligned by MUSCLE alignment with the Neighbor-joining method. The alignment results were used in phylogenetic and evolutionary analyses and imported into MEGA X ( The phylogenetic tree was built using the Neighbor-joining (NJ) method with 1 000 bootstrap values and pairwise deletion. The phylogenetic tree was visualized with Figtree.

Mapping genes PsbX on the chromosome, gene structure, and motif analysis

All members of PsbX were mapped onto G. hirsutum chromosomes based on their physical positions. The gene mapping and gene structures of G. hirsutum were constructed with TBtools using a GTF file. Conserved protein motifs were discovered by inputting protein sequences of G. raimondii, G. arboreum, G. hirsutum, and G. barbadense PsbX to the Meme-Suit program ( (Bailey et al. 2009). The following parameters were used: motif width was set to 6–100 and motif site distribution was set to zero or one site per sequence.

Promoter region cis-acting element analysis

The promoter sequences (2 kb upstream of the translation start site) of all PsbX genes were analyzed to predict and locate their cis-elements from the online tool PlantCARE ( (Lescot 2002).

Physicochemical features and subcellular location prediction

The physicochemical properties were calculated using the ExPASy ProtParam tool ( for PsbX genes including the number of amino acids (protein length), pI (isoelectric point) and MW (molecular weight). Furthermore, an online tool Wolf PSORT ( was used to predict the subcellular localization of PsbX genes in cotton.

Ka/Ks analysis

The divergence of genes was estimated by analysis of Ka/Ks values. Ka/Ks values were calculated using a Galaxy wrapper for Ka/Ks calculations (

Transcriptional profile analysis

Online available transcriptome data of G. hirsutum L. for abiotic stress and tissue specific expression was downloaded from NCBI SRA database. The project names are PRJNA490626 and PRJNA532694. Methods from Pertea et al (2016) were adopted to analyze the RNA-seq data.

Plant stress experiment, RNA extraction and qRT-PCR analysis

The seeds of G. hirsutum L. (cultivar TM-1) were collected from Institute of Cotton Research, Chinese Academy of Agricultural Sciences (CAAS, Anyang, China). The collected seeds were grown in the growth room. When the plants grew to three true leaf stages, stress was applied. For drought treatment, 20% PEG solution and for NaCl treatment, 250 mmol·L-1 NaCl solution was applied to the pots. Samples were taken from the leaves. Time for sampling was 0 h, 2 h, 6 h, 12 h, 24 h, 48 h, and 72 h after treatment. The collected leaf samples were stored in liquid nitrogen and subsequently stored at −80 °C. TIANGEN, RNAprep Pure Plant plus Kit ( was used for RNA extraction following manufacturer instructions.

NanoDrop 2000 was used to check the quality and concentration of RNA extracted with a standard wavelength of 260/280 nm, which must be between 1.80~2.10 and cDNA was synthesized from 500 ng total RNA using the Prime-Script® RT reagent kit (Takara, Dalian, China). We selected ten PsbX genes from the total identified PsbX genes for RT-qPCR. The primers were designed (Additional file 1: Table S1) by using NCBI PRIMER BLAST website (

Applied Biosystems 7500 Real-Time PCR system (Applied Biosystems, Foster City, CA, USA) was employed for qRT-PCR. The G. hirsutum constitutive β-actin gene was kept as a reference gene and PsbX gene primers were applied for RT-qPCR. The relative expression levels of PsbX gene after infiltration were calculated using the 2−ΔΔCT method (Livak and Schmittgen 2001).


Genome wide identification of PsbX gene family

Various bioinformatics analyses were conducted to explore the PsbX gene family. Comparative analysis of all cotton genomes was performed using the Hidden Markov Model (HMM) and the Local Blast Research tool, considering E-values ≤ 10−10 for the PsbX gene family. Elimination of redundant PsbX gene sequences was carried out through the application of multiple sequence alignment tools. Furthermore, the identification of protein domains in cotton PsbX was achieved by utilizing various databases, including PROSITE and Pfam (, InterProscan 63.0 (http: // and CD-search ( The elimination of redundant protein sequences resulted in the identification of 40 genes belonging to the PsbX family across nine distinct plant species. The number of PsbX genes in different plant species is as follows: 1 in A. thaliana, 2 in V. vinifera, 2 in O. sativa, 2 in T. cacao, 3 in P. trichocarpa, 5 in G. arboreum, 10 in G. barbadense, 10 in G. hirsutum, and 5 in G. raimondii. Additional details are provided in Additional file 2: Table S2. To avoid ambiguity and duplication, PsbX genes identified in cotton species were renamed as GhPsbX, GaPsbX, GrPsbX, and GbPsbX. The results indicate that the size of the PsbX gene family is comparatively smaller than other gene families.

Phylogenetic analysis of PsbX gene family

The PsbX gene family from nine species, including A. thaliana, V. vinifera, O. sativa, T. cacao, P. trichocarpa, G. barbadense, G. arboreum, G. hirsutum, and G. raimondii, was utilized to construct a phylogenetic tree using the Neighbor-joining (NJ) method. The resulting tree was employed to identify the phylogeny and comprehend the functional relationships among all 40 putative PsbX genes (Fig. 1). Due to the close relationship of the PsbX gene family, all 40 genes were categorized within the same clade.

Fig. 1
figure 1

Phylogenetic tree relationships between 10 G. hirsutum, 10 G. barbadence, 5 G. arboreum, 5 G. raimondii, 1 Arabidopsis, 2 O. stiva, 3 P. trichocarpa, 2 T. cacao, 2 genes from Vitis Vinefera. The MEGA X program using the NJ (Neighbor-joining) method created the phylogenetic tree. The bootstrap test was done with 1 000 iterations. The different species genes are shown in different colors

A notable close relationship was observed among the PsbX genes of the four cotton species compared with other five plant species. Moreover, the TcPsbX2 gene occupied the same subclade but on a distinct branch within the close relationship of GrPsbX2 and GbPsbX2-Dt genes. Additionally, G. raimondii exhibited a greater number of PsbX genes compared with T. cacao. Similarly, PsbX genes of monocot species like O. sativa were positioned at the end of the clade on separate branches in the phylogenetic tree. AtPsbX1 was identified on a separate branch closely related to two PsbX genes of O. sativa. Nevertheless, PsbX genes of dicots (Arabidopsis, cotton, and cacao) were identified in the same clade. These findings suggest that the primary function of the PsbX gene family evolved prior to the divergence between monocots and dicots. The current results demonstrate that the nine plant species are situated in one clade, consisting of various subclades of PsbX gene members. This strongly indicates that the variations in these plants emerged after the expansion of the PsbX gene family. Furthermore, PsbX genes were notably more abundant in Gossypium genomes compared with other species.

Furthermore, a distinctive observation was noted that several subclades exclusively included members of the PsbX gene family derived from a specific plant. Despite the greater similarities among orthologs of the four cotton species, the orthologs of A-genome and At sub-genome of cotton clustered together. A similar pattern was observed for D genome and Dt-sub genome of cotton. Consequently, the aforementioned results indicate that PsbX orthologs from either genome of At-A or Dt-D may share a common ancestor.

Gene structure, conserved motif, and sequence logo analysis of PsbX

Gene structure analysis was conducted on the sequence file of PsbX genes, as depicted in Fig. 2A. The total number of exons was 1 or 2, with an average of 1.5. All genes exhibited only one exon (CDS), except for a gene that possessed two exons. Introns were identified in only two PsbX genes, while the remaining 28 had no introns. Typically, genes present in the same evolutionary branch and clade shared similar structures, possibly originating from a conserved gene pattern in terms of exon/intron length and numbers. Generally, untranslated regions (UTR) were found in G. hirsutum and G. barbadence. However, no UTR was identified in both G. arboreum and G. raimondii species.

Fig. 2
figure 2

Structural features of PsbX gene family in cotton. A Three motifs identified by the MEME tool are represented by colored boxes, and their consensus sequences are shown in supplementary Fig. S1, B The exon–intron arrangement of PsbX gene. The blue shapes represent UTR, red shapes indicate CDS and the black line represents introns

The gene structure of GrPsbX4 differed from that of other genes, suggesting a potential impact on evolutionary speed and function. MEME software (Additional file 3: Fig. S1) was employed to identify three conserved motifs of PsbX genes with a width range of 15~60 amino acids. However, both the number and type of conserved motifs in different genes were consistent within the same branch of the phylogenetic tree. Sequence logo analysis of conserved amino acid residues for the three motifs was conducted across all cotton species to confirm the conservation of PsbX gene family proteins during evolution. The resulting sequence logos exhibited substantial conservation throughout N and C terminals at various sites among all cotton species (Fig. 2A).

Chromosomal locations, physicochemical features and subcellular location prediction of PsbX gene family

The identified PsbX genes were assigned to their respective chromosomes. Figure 3 illustrates that five genes were assigned to chromosomes of At sub-genome, while other five genes were allocated to their corresponding chromosomes of Dt sub-genome. Chromosome A13 and its homolog D13 harbored three genes each, but most chromosomes only had a single GhPsbX gene. One gene (PsbX) was assigned to chromosomes A05, D05, A07, and D07, respectively. Moreover, the results indicated that gene loss did not occur, as genes were evenly distributed in the cotton genome (At and Dt sub-genome) during evolution. This distribution may be a consequence of the relatively small size of the PsbX gene family.

Fig. 3
figure 3

Gene mapping on chromosome of G. hirsutum. Chromosome number is mentioned below of each bar. Gene name is indicated by red color

Subsequently, the physiochemical properties of the PsbX gene family were predicted, including locus ID, the corresponding chromosome, start and end points, strand polarity, protein length (amino acid, aa), MW, pI, and cellular localization.

The details of physiochemical properties are shown in Table 1. The results revealed that five GhPsbX genes diverged from At sub-genome, and the other five genes originated from Dt sub-genome of cotton species. The protein length was also determined in the range of 120~200 amino acids. The MW was 12 386.5 kDa for GhPsbx1-At and 13 126.32 kDa for GhPsbx2-Dt. Similarly, the determination of cellular localization for GhPsbX proteins indicated their presence in the chloroplast. Other predicted physiochemical properties are presented in Table 1.

Table 1 List of PsbX genes identified in Gossypium and their sequence physiochemical properties

Identification of cis-regulatory element

Cis-regulatory elements were identified in the upstream (promoter) region of 1 500 bp for each gene and further searched in the Plant CARE database. Various cis-regulatory elements present in the promotor region of PsbX genes may be the reason for diverse function of putative genes. The resulting cis-regulatory elements within the promoter region of PsbX genes have been presented in Additional file 4: Table S3. Typically, regulatory sequences such as TGACG-motifs, STREs (rapid stress response elements), TATA boxes, and CAATs were identified among all genes of the PsbX family in upland cotton. The primary cis-acting elements CAATs and TATA boxes were located within the promoter region of transcriptional genes in eukaryotes. Additionally, the CAAT box plays a role in the regulation of gene expression by providing a binding site for RNA transcription factors (TF) (Laloum et al. 2013; Shore and Sharrocks 1995; Ramji and Fok 2002).

The CAAT box also plays a significant role in modulating the nopaline synthase promoter (Dai et al. 1999). Presently, a TATA box has been reported for its supportive role in transcription, harboring a binding site for transcription factors (TF) or histones (Isogai et al. 2007; Bae et al. 2015). The primary transcriptional stress response is activated via STREs against abiotic and/or biotic stresses (Walley et al. 2007). The above-mentioned results demonstrate that cis-regulatory elements play a pivotal role, enabling the plant to cope with stresses such as intense light and drought.

Syntenic blocks

The hybridization of two diploid cotton species, such as G. arboreum and G. raimondii, resulted in the allotetraploid species G. hirsutum. However, the doubled chromosomes of G. hirsutum exhibited a resemblance to those of G. raimondii and G. arboreum. The expansion of gene families is attributed to various duplication events, including tandem, segmental, and whole-genome duplication. Syntenic blocks are characterized by the similar chromosomal arrangement of conserved genes among different species. In the current study, associated gene pairs among the allotetraploid G. hirsutum and the two diploid cotton ancestors, G. raimondii and G. arboreum were identified (Fig. 4). The genes of G. hirsutum on chromosomes A05, A07, A13, D05, D07, D13 were originated from the following chromosomes Ga05, Ga07, and Ga13 of conserved syntenic blocks of the ancestral species (G. arboreum). Additionally, orthologs of G. hirsutum and PsbX were supposed to be present on G. raimondii chromosomes like Gr05, Gr07, and Gr13.

Fig. 4
figure 4

Synteny analysis of PsbX gene family in all cotton species. Cotton species are represent by different colors. Red color represents G. raimondii, green color indicates G. arboreum, sky blue color indicates G. barbadense, and purple represents G. hirsutum. Chromosome number of PsbX is indicated by dark blue color

Ka/Ks ratio

During the process of evolution, duplicated genes might be classified into following non-functionalization, sub-functionalization, and neo-functionalization. The non-synonymous (Ka) and synonymous substitution (Ks), rates of substitution (Ka/Ks value) might be used to deduce the magnitude of both positive selection as well as selective constraints. Typically, the criterion of Ka/Ks ratio for positive selection is greater than one, for neutral evolution equal to one, and for purifying selection less than one. In the current study, the estimation of Ka, Ks, and Ka/Ks values of PsbX homologous gene pairs for G. hirsutum was presented in Table 2. The results indicated that the Ka/Ks < 1 for 4 PsbX genes of homologous pairs and one gene which had ratios of Ka/Ks > 1, implying that these genes might have been subjected to positive and purifying selection, subsequent segmental as well whole genome duplications.

Table 2 Gene evolution forms of Ka, Ks and Ka/Ks values for homologous PsbX gene pairs of G. hirsutum

Expression profiles of PsbX gene family in various tissues and organs

The growth and development of plants are regulated by gene expression. Analyzing the expression level of genes can provide information on gene ontology. Thus, the expression pattern of PsbX genes in various cotton tissues and organs was evaluated. Firstly, the gene expression analysis of PsbX was performed from online accessible transcriptomic data of G. hirsutum L. against abiotic stresses. The tissue-specified expression data was also downloaded from an online database named NCBI SRA. The various organs used for this study include leaves, roots, stems, and reproductive tissues (torus, petals, stamens, and pistils), and exposure to abiotic stresses, including salt and PEG treatments at different time periods.

A heat map was generated for gene expression, revealing that genes with similar expression levels clustered closely together. However, the expression levels of many genes varied significantly across different tissues (Fig. 5). For instance, a lower expression level of many PsbX genes was observed in the following organs/tissues: roots, torus, filament, anther, and sepal. Interestingly, a higher expression level was observed for the following genes, GhPsbX2_At, GhPsbX2_Dt, GhPsbX1_At, and GhPsbX1_Dt, in the bracket, stem, and leaf. Moreover, a higher expression level of GhPsbX2_At was observed, albeit irregularly, in various floral organs. Comparatively, low expression levels of two genes named GhPsbX5_At and GhPsbX3_Dt was observed across all tissues of cotton, while one gene, GhPsbX3_Dt, exhibited a high expression level in the leaves of cotton.

Fig. 5
figure 5

Expression profiles of PsbX genes under drought and salinity treatments. (A) Heat map show expression profile in different tissues (B) Genes are shown on the right, and the phylogenetic relationships are shown on the left

Plants are unable to move and become vulnerable to various abiotic stresses during growth and developmental stages. Subsequently, we examined the gene expression pattern of GhPsbX against abiotic stresses including salinity and drought at various time durations. The results of the heat map showed up- and down-regulation of the expression pattern of all GhPsbX genes against different stress treatments at various periods (Fig. 5).

The genes with similar responses formed similar clusters together. The following genes like GhPsbX3_At, GhPsbX1_At, GhPsbX1_Dt, GhPsbX4_Dt, GhPsbX2_At, and GhPsbX2_Dt were highly expressed as compared with other genes under drought and/or salt stress at various time intervals. However, two genes, GhPsbX5_At and GhPsbX3, exhibited a relatively lower expression level in response to salinity and drought treatments. Likewise, ten genes of GhPsbX expressed in salt and drought conditions were selected. The response of these selected genes was evaluated by qRT-PCR analysis at different periods (Fig. 6). All genes were highly expressed compared with the control at various times under each stress treatment, with some exhibiting expression or lack thereof for different treatments at specific periods. Furthermore, PEG treatment induced the up-regulation of genes, namely GhPsbX1-At, GhPsbX3-At, GhPsbX4-At, GhPsbX5-At, GhPsbX1-Dt, GhPsbX2-Dt, and GhPsbX3-Dt at 2 h, 12 h, and 24 h, while NaCl treatment suppressed the expression of the aforementioned genes. However, NaCl treatment up-regulated the GhPsbX2-At gene at 2 h and 48 h. Interestingly, GhPsbX4-Dt showed the highest expression level at 2 h in both PEG and NaCl treatments compared with other time points. Moreover, GhPsbX5-Dt was upregulated by both NaCl and PEG treatments for 2 h, 48 h, and 2 h, 24 h, and 72 h, respectively. All PsbX genes were downregulated at 6 h and 12 h under PEG and NaCl treatments.

Fig. 6
figure 6

Relative expression level by qRT-PCR analysis for selected GhPsbX genes. Error bars indicate the standard deviations (SD) among three independent biological repeats


Numerous studies have aimed to understand the physiological and molecular functions of genes involved in photosystems across various plant species. Notably, the PsbX gene family's role in cotton plants has not been explored in the existing scientific literature. This study aims to fill this void by investigating PsbX genes in both diploid and allotetraploid cotton species. Our analysis encompassed the evolutionary relationships, gene duplications, selection pressures, and expression levels in response to different abiotic stresses. The availability of whole-genome sequencing data for cotton has provided an unprecedented opportunity for a thorough exploration of the PsbX gene family's potential functions in the photosynthetic process.

In this study, we identified and scrutinized PsbX genes in four cotton species: G. hirsutum, G. raimondii, G. arboreum, and G. barbadense. While data were collected for all species, our primary focus was on G. hirsutum. The findings from our study are poised to provide foundational knowledge and serve as a valuable resource for further investigations into the function of PsbX genes in cotton species. In our study, 40 PsbX genes were identified in nine different species of plants like T. cacao, G. arboreum, G. hirsutum, G. raimondii, G. barbadense, P. trichocarpa, V. vinifera, A. thaliana, and monocotyledons (O. sativa). While orthologs of monocot species formed distinct clusters, orthologs of dicots exhibited similar clustering, suggesting that the fundamental function of the PsbX gene family originated before the divergence of dicots and monocots (Chen et al. 2018).

The PsbX protein, encoded by the nucleus in Photosystem II (PSII), is known for its relatively smaller size. This gene family exhibits a modest number of genes. Our study revealed distinct gene compositions in diploid species, with T. cacao displaying fewer gene members, approximately half the number found in its close relative G. raimondii. These observations suggest an expansion of the PsbX gene family during the evolutionary trajectory of cotton. Previous research showed that both cacao as well as cotton were subjected to paleohexaploidisation events shared by the eudicots. However, a recent duplication event was observed in cotton, but no such event was observed in T. cacao. It can be concluded that diploid cotton possesses more PsbX genes (Wang et al. 2012). It is interesting to note that PsbX gene numbers in G. hirsutum, revealing a twofold increase compared with its diploid counterparts, G. raimondii and G. arboreum. G. hirsutum, being a typical allotetraploid species resulting from the hybridization of A and D genomes, experienced chromosome doubling around 1–2 million years ago (Renny-Byfield et al. 2016). Similarly, upland cotton underwent conservation following the polyploidization event, with orthologs of At and Dt genomes maintaining significant collinearity.

The evolution of multigene families was driven by structural diversity (Sánchez-Gracia et al. 2011; Davies et al. 2014; Suarez et al. 1998). The exon/intron arrangements in cotton were studied to elucidate the structural diversity of PsbX genes. The diverse number of exons/introns among genes might lead to the functional diversity of PsbX genes. It could possibly occur due to exon gain or loss in PsbX gene family during evolution. Various reports, based on gene structure analyses, indicate that introns have wielded substantial influence during the evolutionary phases in various plant species (Suarez and Gilbert 2006). A greater number of introns were present during the early phase of expansion but later decreased with the passage of time (Roy and Penny 2007). These reports revealed that higher species had a smaller number of introns in their genomes (Roy and Gilbert 2005). The greater number of introns in the genome might pose a significant burden on the organism, and introns could possibly change the activities of genes and lead to new functions of genes. Likewise, the smaller number of introns were found in mostly genes of cotton species such as G. hirsutum, G. barbaence, G. arboreum, and G. raimondii. The absence of introns might suggest that the gene expansion possibly occurred independently. The genome wide analysis also revealed that extensive loss/gain of introns occurred via eukaryotic diversification event (Roy and Penny 2007; Rogozin et al. 2003). Moreover, the results indicated that the presence of three conserved motifs with long sequences of PsbX gene family also implied the conservation of motifs of PsbX gene family in cotton plants.

The even allocation of GhPsbX genes to At- and Dt-sub genomes confirmed the equal distribution of chromosomes as both genomes had five genes. The physicochemical properties such as predicted subcellular localization, pI, and MW of genes were almost similar because of the equal allocation of genes in both sub-genomes. All GhPsbX genes had a similar distribution of cis-elements of promoter associated with growth, development, and stress responses. It is reported that light intensity plays a pivotal part in the growth and differentiation of plants (Qanmber et al. 2019; Fankhauser and Chory 1997). Cis-elements like TCA and CGTCA are engaged in the regulation of gene expression when exposed to MeJA and SA stress, correspondingly (Wen et al. 2014; Maestrini et al. 2009). In addition, the existence of W-box mediated ABA responses under drought stress (Singh et al. 2002). The above-mentioned cis-elements present in the majority of PsbX genes were used for identification of predicted function during growth, development, and various stresses. Abiotic stresses adversely affect the photosynthesis process in result of reducing the photosynthetic activity (Nouri et al. 2015). The response of the photosynthetic system is diverse under drought stress in seedlings as well as mature crop. In mature crops, drought stress disturbs photosynthetic activity in turn ROS species are formed due to excessive light absorption eventually impairing the photosynthetic apparatus. However, chlorophyll biosynthesis genes tended to downregulate, PSI and PSII ceased to stop the sunlight capturing to minimize the damage when young seedlings were exposed to drought stress (Dalal et al. 2018). Chloroplast is a powerhouse of plants; therefore, it has prime importance in plant sciences. However, it is also vulnerable to both abiotic as well as biotic stresses as it shows the actual level of plant response toward stress (Li et al. 2020; Liu et al. 2013). PsbX protein is part of PSII and plays a critical role in photosynthesis. The function of PsbX gene family has not been elucidated yet in cotton. Identification of the molecular function of PsbX gene family will aid in studying the function of PsbX gene family in various important crops.

The tissue expression pattern data of RNA seq were downloaded from NCBI SRA data to figure out the possible function of GhPsbX genes in specific tissues under abiotic stresses. The selected ten genes expressed in specific tissues revealed that these genes (PsbX) contributed positively to the development of leaf, bracket, and stem. The current finding indicates that PsbX gene family possibly play a vital role in photosynthesis in cotton plant. Additionally, all genes were highly upregulated positively as compared with the control under PEG and NaCl stress treatments at different time intervals confirmed by GhPsbX genes analysis. PEG induced the positive expression of many genes at 2 h, 12 h, and 24 h, however, GhPsbX-At2 was upregulated under NaCl treatment at 2 h and 48 h.

Overall, our results indicate that GhPsbX expression levels varied among different organs/tissues like stem, leaf, and bracket. Similarly, the gene expression of GhPsbX might possibly be regulated under drought and salinity stresses. So, GhPsbX genes could be the potential candidates for the cotton breeding program.


In the current study, PsbX genes were identified and characterized in four cotton species. We performed genome-wide identification of PsbX gene family in cotton, for instance, identification of gene structure, evolutionary analyses, and expression patterns of genes. The result indicates that PsbX gene family was highly conserved in cotton and in other plant species. The existence of cis-elements within promoter regions having typical characteristics exhibited that they may have a potential role in plant growth as well as stress responses. The expression level of PsbX genes differed in various tissues during various abiotic stresses. The results showed that PsbX gene might be regulated by abiotic stresses and this information could be useful in future cotton breeding programs.

Availability of data and materials

The datasets used and analyzed during current study are available from the corresponding author on reasonable request.


Download references


Not applicable.


This work was supported by National Natural Science Foundation of China (32060466) and Chinese Academy of Agricultural Sciences.

Author information

Authors and Affiliations



Du XM, Ali I, and Raza I designed the experiment; Du XM, Ali I supervised the research. Raza I and Hu DW collected the samples; Raza I and Ahmad A worked on the analysis. Raza I, Parveen A, and Pan ZE interpreted the results and wrote the manuscript. All authors read, edited, and approved the final manuscript.

Corresponding authors

Correspondence to Ali Imran or Du Xiongming.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Supplementary Information

Additional file 1: Table S1.

Specific primers used in relative quantitative real-time RT-PCR.

Additional file 2: Table S2.

Gene IDs of cotton species.

Additional file 3: Figure S1.

Consensus sequences of motif identified by MEME tool.

Additional file 4: Table S3.

Cis-element analysis of PsbX gene promoters.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Raza, I., Parveen, A., Ahmad, A. et al. Genome-wide identification and expression profiling of photosystem II (PsbX) gene family in upland cotton (Gossypium hirsutum L). J Cotton Res 7, 1 (2024).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: