Skip to main content

Genome wide identification and characterization of light-harvesting Chloro a/b binding (LHC) genes reveals their potential role in enhancing drought tolerance in Gossypium hirsutum



Cotton is an important commercial crop for being a valuable source of natural fiber. Its production has undergone a sharp decline because of abiotic stresses, etc. Drought is one of the major abiotic stress causing significant yield losses in cotton. However, plants have evolved self-defense mechanisms to cope abiotic factors like drought, salt, cold, etc. The evolution of stress responsive transcription factors such as the trihelix, a nodule-inception-like protein (NLP), and the late embryogenesis abundant proteins have shown positive response in the resistance improvement to several abiotic stresses.


Genome wide identification and characterization of the effects of Light-Harvesting Chloro a/b binding (LHC) genes were carried out in cotton under drought stress conditions. A hundred and nine proteins encoded by the LHC genes were found in the cotton genome, with 55, 27, and 27 genes found to be distributed in Gossypium hirsutum, G. arboreum, and G. raimondii, respectively. The proteins encoded by the genes were unevenly distributed on various chromosomes. The Ka/Ks (Non-synonymous substitution rate/Synonymous substitution rate) values were less than one, an indication of negative selection of the gene family. Differential expressions of genes showed that majority of the genes are being highly upregulated in the roots as compared with leaves and stem tissues. Most genes were found to be highly expressed in MR-85, a relative drought tolerant germplasm.


The results provide proofs of the possible role of the LHC genes in improving drought stress tolerance, and can be explored by cotton breeders in releasing a more drought tolerant cotton varieties.


Cotton is an important commercial crop because of its useful natural fiber source and the fact that it can be grown in a variety of climates around the world. Cotton and its by-products are in higher demand than ever before, due to the increased use of this fiber in the textile industry and the use of cottonseed as a source of edible oil (Hassan et al. 2020). It is an important multiuse crop, which is highly sensitive to both biotic and abiotic stresses (Zahid et al. 2016). Its production has undergone a sharp decline because of abiotic stress influences, of significance is drought.

Over the course of the twenty-first century, food production has to match the increasing population (Beddington et al. 2012). However, temperature increments and climate change have deepened the incidence and harshness of abiotic stresses that critically disturb the growth and development of crops (Nouri et al. 2015). Abiotic stress remains one of the key components of yield losses in plants (Sasi et al. 2018). Moreover, abiotic stress has a major impact on plant growth and development as compared with other forms of living organisms due to their immobility (He et al. 2018; Magwanga et al. 2018; Xu et al. 2019). Among the various forms of abiotic stress factors, drought, heat, toxicity, and salinity do cause over-reduction of the electron transport chain (ETC) resulting in photooxidation (Nishiyama and Murata 2014). Furthermore, in the chloroplasts, drought, high light, salinity, or extreme temperature stresses do trigger a diminishing in CO2 assimilation rates, which in turn induce an upsurge in reactive oxygen species creation, which eventually leads to yield loss (Pintó-Marijuan and Munné-Bosch 2014). It has been reported that abiotic stresses account for over 50% yield loss in crops (Nath et al. 2013). Moreover, a decrease in photosynthesis rate results in a remarkable reduction of yield in crops (Nouri et al. 2015).

Drought exposure alters the photosynthetic apparatus in the plants, thus plants have evolved numerous coping mechanisms, one of which is the evolution of various plant transcription factors (Hussain et al. 2018). The known plants genes with net effects on plant photosynthetic process are, Ribulose bisphosphate carboxylase large chain (RBCL) (Berry et al. 2013), light-harvesting chlorophyll a/b-binding (LHC) (Zhao et al. 2020), and Cytochrome P450 genes (Magwanga et al. 2019). Using light-harvesting chlorophyll a/b-binding (LHC) proteins, photosynthesis requires the accumulation of light and the conversion of solar energy. In higher plants, the LHC gene family includes LHCA and LHCB sub-families, which encode proteins constituting the light- harvesting complex of photosystems I and II (Kong et al. 2016). The LHC proteins are the apoproteins of the Light-Harvesting complex of photosystem II (PSII), outer antenna complex which are perhaps the most abundant membrane proteins in nature (Horton and Ruban 2005; Król et al. 1995; Xu et al. 2012). Moreover, studies have shown that LHCB1, LHCB2, LHCB3, LHCB4, LHCB5, or LHCB6, affects stomatal responsiveness to abscisic acid (ABA) influx, and therefore lowers the plant’s tolerance against drought stress during their down-regulation (Xu et al. 2012). Furthermore, downregulation of the LHCB genes does cause ABA insensitive phenotypes in seed germination and post-germination growth (Liu et al. 2013). In the recognition of the proteins encoded by the LHCB genes, 28 have been identified in Papaya carica (Zou et al. 2020), 17 in Hordeum vulgare L. (Qin et al. 2017), 25 in Camellia sinensis (Li et al. 2020), and 35 genes in Manihot esculenta (Zou and Yang 2019). However, the role of this important plant gene family concerning abiotic stress factors in cotton have not been studied. The complete sequencing of Gossypium hirsutum (Hu et al. 2019), G. arboreum (Huang et al. 2020), and G. raimondii (Wang et al. 2012) provided the needed information to carry out functional analysis of the proteins encoded by the LHC genes in the three cotton genomes.

Materials and methods

Plant material and hydroponics

Marie-galante 85 (MR-85), a race developed from G. hirsutum species and comparatively tolerant to abiotic stress, was used as an experimental material (Xu et al. 2020). The seeds were soaked in water overnight and put on the absorbent paper for germination. After 7 days of germination seedlings were transferred to a hydroponic setup composed of Hoagland nutrient solution, placed in the greenhouse, with 16 h/8 h light-dark and at 28 °C/25 °C day-night temperature (Zhao et al. 2020). At the third leaf stage, drought stress was applied by supplementing the nutrient solution with 17% of PEG-6000 (Liu et al. 2013). The leaf, stem, and root tissues were then collected for RNA extractions at 0 h, 3 h, 6 h, 9 h, 12 h, and 24 h post stress exposure. The experiment was conducted in a greenhouse located in Anyang, Institute of Cotton Research, CAAS, with a complete randomized design (CRD) with three biological replications.

Identification of the LHC proteins in cotton species

The domain number PF00504 was used to search the cotton proteins encoded by the LHC genes. The LHC proteins for G. hirsutum, G. raimondii and G. arboreum were downloaded from the cotton functional genomics database (, while those for Arabidopsis thaliana, and Theobroma cacao were downloaded from phytozome ( The protein sequences of the LHC downloaded from the CottonFGD were uploaded to the Pfam database ( for the identification of the putative LHC proteins with the best domain e-value cutoffs f < 1 × 10–4 (El-Gebali et al. 2019). Moreover, the CottonFGD website ( was explored to obtain the physicochemical characteristics of the gene family, such as protein length (PL), molecular weight, molecular charge, isoelectric point (pI), and GRAVY value.

Phylogenetic tree and collinearity analysis

Protein sequences of G. hirsutum, G. arboreum, G. raimondii, Arabidopsis thaliana and Theobroma cacao were aligned by ClustalX (Larkin et al. 2007) in MEGA 7.0 for phylogenetic tree construction. We use Neighbor-joining (NJ) method to know the evolution distance, Jones-Taylor-Thornton (JTT) as substitution model of 1 000 bootstrap replications (Tamura et al. 2011). To categorize the homologous genes of cotton species, the protein sequences of G. hirsutum were BlastP searched alongside the protein database of G. arboreum and G. raimondii; hits with E-values ≤1 × 10–5 and similarity ≥ 90% were considered significant. The GFF3 file, linked file, and Gene IDs were applied to construct the collinearity analysis by TBtools software (Chen et al. 2018). Homologous genes between G. hirsutum, G. raimondii and G. arboreum were sorted out from CottonFGD employing protein blast with a threshold of ≥ 80% match and at least an 80% alignment ratio based on the protein length.

Chromosomal mapping, gene ontology, and cis-regulatory elements analysis

To know the distribution of Light-Harvesting Chloro a/b-bind genes on all the chromosomes of A, D, and AD cotton genomes, we used the GFF3 file from CottonFGD ( and gene ID of the genes. Then we used the TBtools software to show the genes distribution on chromosomes. Presumed functions of 109 genes in the Gossypium Light-Harvesting Chloro a/b-bind gene family, including biological processes (BP), molecular functions (MF), and cellular components (CC) were identified using agriGO ( online analysis (Ashburner et al. 2000).

Analysis of the gene structure of the Light-Harvesting Chloro a/b-bind genes in G. hirsutum, G. arboreum, and G. raimondii was done by means of the Gene Structure Display Server-GSDS 2.0, an online tool ( The motifs were identified by using an online tool MEME ( The 2000-bp upstream sequences of LHC genes of cotton species were downloaded from CottonFGD ( to identify the cis-regulatory elements in the putative promoter regions. The fasta file of the upstream sequence was submitted to Plant-Care search ( for identifying the putative ciscis-regulatory elements among the promoter sequences (Lescot et al. 2002). TBtools was used to visualize the structure.

Gene evolution and subcellular localization prediction of LHC proteins

The coding sequences (CDS) and protein sequences of the homolog genes were downloaded from the CottonFGD website ( CDS, protein sequences and gene IDs of the homolog genes were used to compute the Ka/Ks (Non-synonymous substitution rate/Synonymous substitution rate) value by TBtools software (Chen et al. 2018). For the estimation of protein subcellular localization of the LHC gene family, protein sequences of the three Gossypium species were downloaded from the CottonFGD ( The subcellular localization for the LHC proteins of Gossypium species was predicted using an online tool Wolf PSORT (

RNA extraction and RT-qPCR analysis

Total RNA was extracted using TIANGEN, RNAprep Pure Plant Plus Kit ( according to the manufacturer guidelines. Nano Drop 2000 was used to check the quality and concentration of RNA extracted with a standard of 260/280 which must be between 1.80–2.1 (Joshi et al. 2016). Thus, we convert the RNA to cDNA using TransGen Biotech Kit (TransGen Biotech Co., Ltd., Beijing, China,, following the kit instructions. From the LHC gene family, we select 27 genes for RT-qPCR and design the primers (Table S1) using NCBI website ( For the RT-qPCR analysis, 7500 Fast Real-Time System with 2 μL cDNA, 2 μL forward and reverse primers, 6 μL RNA-free water, and 10 μL SYBR solution were used. Three biological and technical replications were used with Ghactin7 as an internal control. E = 2-ΔΔCT formula was used to calculate the gene expression levels (Schmittgen and Livak 2008).


Identification of the cotton LHC proteins

One hundred and nine proteins translated by the LHC genes were recognized in the three sequenced cotton genomes, with 55, 27, and 27 proteins in G. hirsutum (AD), G. raimondii (D) and G. arboreum (A), respectively (Table S2). The total number of the proteins found related to the LHC genes in the two diploid cotton species, G. raimondii and G. arboreum, were one less than the number of LHC proteins in G. hirsutum, may be due to AD emerged in the whole genome duplications between A and D genomes.

The protein lengths for the G. hirsutum proteins stretched from 62 aa to 644 aa, molecular weights ranged from 6.88 kDa to 72.66 kDa, respectively, in Gh_Sca017783G01 and Gh_A02G1068, and the molecular charge ranged from − 8.5 (Gh_A01G0519) to 7(Gh_A02G1068), the isoelectric point (pI) ranged from 4.701 (Gh_D06G2350) to 10.228 (Gh_D04G1505) and finally the grand average of hydropathy (GRAVY) ranged from − 0.529 (Gh_A01G0519) to 0.233 (Gh_D06G2350) (Table S1).

In the two diploid cotton species, the G. arboreum and G. raimondii, the physiochemical properties of the LHC proteins exhibited slight differences in molecular weights, protein lengths, pI, molecular charge, and GRAVY values. The protein length stretched from 114 aa to 610 aa, and 151 aa to 349 aa, respectively, and the molecular weights ranged from 12.823 to 68.741 kDa, and 16.55 to 38.267 kDa by a charge range of − 6 to 9 and − 4.5 to 7.5 in G. arboreum and G. raimondii, respectively (Table S1).

On the other hand, the values for the pI and GRAVY were almost the same, the pI ranges from 4.87 to 9.897, and 4.701 to 9.296, the GRAVY ranges from − 0.377 to 0.167 and − 0.249 to 0.244 in order of G. arboreum and G. raimondii, respectively. In all cotton species, the GRAVY values were low (positive and negative), which indicates the likelihood of enhanced relations with water that leads to hydrophilic nature.

Phylogenetic tree and Synteny block analysis of the cotton LHC proteins

The phylogenetic analysis grouped the LHC proteins together with other plants into five clades. Numerous homolog gene pairs were formed among the several proteins encrypted by the cotton Light-Harvesting Chloro a/b binding genes (Fig. 1a).

Fig. 1
figure 1

a Phylogenetic tree of LHC genes in G. hirsutum, G. arboreum, G. raimondii, Arabidopsis thaliana and Theobroma cacao. The tree was done using MEGA 7.0. b Synteny blocks formation among cotton species chromosomes. A: Chromosomes of G. arboreum; D: Chromosomes of G. raimondii, At and Dt: Chromosomes of A and D Subgenome of the tetraploid cotton, G. hirsutum

The collinearity analysis among the three cotton species was done, in which Circle gene viewer was applied to distinguish the collinear gene pairs with TBtools software (Chen et al. 2018). The collinearity analysis between the physical map of At and Dt subgenomes of G. hirsutum, G. arboreum and G. raimondii for their A Vs D; A vs At, and between D Vs Dt subgenome relationships were observed. We found a good collinearity between A vs D with 23 genes, A vs At with 20 genes, and between D vs Dt with 23 genes in the subgenome (Fig. 1b).

Gene ontology analysis

According to the gene ontology analysis, in G. hirsutum, the functions in biological processes (GO: 0008150), were cellular and metabolic processes, various cellular (GO: 0005575) functions were noted in the cell and cell part. Similarly, in G. arboreum, the biological functions (GO: 0008150) were responsible for stimuli, cellular and metabolic processes. Whereas in cellular component (GO: 00055750), the functions focused on cell, macromolecular complex (protein), and membrane related issues, whereas molecular functions (GO: 0003674) were related with binding function. In G. raimondii the biological process (GO: 0008150) was coined with cellular and metabolic processes, which is similar to G. hirsutum, whereas in cellular component (GO: 0005575), the function is related to membrane. In both G. hirsutum and G. raimondii, there was no significant GO term for molecular functions (Fig. 2).

Fig. 2
figure 2

Gene Ontology annotation analysis of LHC genes, showing their role in biological processes, cellular component and molecular function a G. hirsutum b G. arboreum c G. raimondii

Gene structure and motif identification of LHC proteins

Gene structural study is observed as a likely sign of the evolution of multigene families (Nei and Rooney 2005). To obtain additional evidence into the structural diversity of cotton Light-Harvesting Chloro a/b-bind genes, the exon/intron association in the representative transcripts were investigated in contrast with their equivalent genomic DNA sequences of distinct genes in G. hirsutum, and it was found that a higher proportion of the Light-Harvesting Chloro a/b-bind genes and their exons were extremely conserved inside the group.

Gene structures of some Light-Harvesting Chloro a/b-bind genes possessed introns. The maximum number of introns observed for the Light-Harvesting Chloro a/b-bind gene structures were eleven (Gh_A02G1068), eleven (Ga02G0756), and five (Gorai.003G092700) for G. hirsutum, G. arboreum and G. raimondii, respectively. The highest number of exons and introns were found in Gh_A02G1068 (twelve exons, eleven introns) and Gh_A01G0519 (ten exons, nine introns). Remarkably, exons and introns for diverse Light-Harvesting Chloro a/b-bind genes were observed to be dissimilar based on their lengths. For example, eighteen genes have two exons and one intron and seven genes have three exons and two introns, whereas, seven genes with one exon but no intron (Fig. 3).

Fig. 3
figure 3

Gene structure display using Gene Structure Display Server - GSDS 2.0 online tool for G. hirsutum, G. arboreum and G. raimondii

On the other hand, in the diploid species, the maximum number of exon/intron were twelve exons, eleven introns (Ga02G0756), eleven exons, and ten introns (Ga01G0731) in G. arboreum, whereas G. raimondii six exons, five introns (Gorai.003G092700) and six exons, five introns (Gorai.009G262000), respectively. Similarly, there are seven G. arboreum genes that have two exons and one intron while ten genes in G. raimondii have two exons and one intron. Genes with three exons and two introns as well as a single exon with no intron were five and three, respectively, in both species.

To explore the structural evolution of LHC proteins, the patterns of motifs were analyzed. A total of 20 different motifs were detected by the MEME analysis ( in the three Gossypium species (Fig. 4). Based on the identified motifs, motif 3, motif 4 and motif 12 were the conserved motifs in G. hirsutum, whereas motif 2 and 8 in G. arboreum and motif 11 and 4 in G. raimondii, respectively, were conserved, too.

Fig. 4
figure 4

Motif Identification of LHC proteins a G. hirsutum, b G. arboreum and c G. raimondii

Chromosomal mapping of the LHC genes

The LHC genes were unevenly distributed across various chromosomes of A2, D5, and (AD)1 cotton genomes. In the tetraploid (AD)1 genome with At Subgenome, the highest number of genes were found on chromosome At01, At05, and At10 with three genes, while At03, At08, and At09 chromosomes harbored none. Similarly, in the (AD1), Dt Subgenome, the highest number of genes were found on Dt07, Dt01, and Dt05 with five, four, and four genes, respectively, whereas At03, At08, and At09 had zero genes. The rest of the chromosomes harbored one to three genes (Fig. 5a and b). In the two diploid cotton species, A2 and D5 genomes, the gene distribution arrangement was different. In G. arboreum, the highest genes were observed on chromosomes, A2(05), and A2(07), with the four genes while in G. raimondii, chromosomes D5(01), D5(09), and D5(10) possessed the highest gene number with four genes, respectively, while chromosomes A2(04) and D5(06) harbored none. Some LHC genes have tandem duplications, although most are singletons dispersed along the genome (Fig. 5c and d).

Fig. 5
figure 5

Chromosomal positions of LHC genes in Gossypium species. The chromosomal location of each species was plotted based on their genome. a G.hirsutum At Subgenome b G. hirsutum Dt Subgenome c G. arboreum d G. raimondii e Scaffold distribution of the Gossypium species

Identification of cis-regulatory elements

Cis-acting regulatory elements are important molecular switches involved in the transcriptional regulation of a dynamic network of gene activities controlling various biological processes, including abiotic stress responses, hormone responses, and developmental processes. These genes encode genomic blueprints for coordinating spatiotemporal gene expression programs underlying highly specialized cell functions (Mao et al. 2020). Analysis of cis-regulatory elements revealed that ABA-responsive element (ABRE), Antioxidant response elements (ARE), Metal response elements (MRE), Myeloblastosis (MYB), AT-rich elements (ATREs), Dehydration-responsive element (DRE), MBS, Box-4, and Angiotensin-Converting Enzyme (ACE) cis-regulatory elements were found related to drought stress in the three cotton species (Fig. 6). The ABA-responsive element (ABRE) and the dehydration-responsive element/C-repeat (DRE/CRT) are two major cis-acting elements involved in ABA-dependent and ABA-independent gene expression in osmotic and cold stress responses (Yamaguchi-Shinozaki and Shinozaki 2005).

Fig. 6
figure 6

Cis-regulatory elements analysis obtained for the various proteins encoded by the LHC genes in three Gossypium Species a G. hirsutum b G. arboreum c G. raimondii

Evolution of LHC genes in Gossypium species

During the evolution, the Ks value of a gene is not affected by natural selection generally, but Ka value is affected. The Ka/Ks value show positive, neutral, and negative selection when, Ka/Ks > 1, Ka/Ks = 1, or Ka/Ks < 1, respectively (Zhao et al. 2020). The distributions of Ka, Ks, and Ka/Ks among homologous pairs of Gossypium species revealed similar results (Fig. 7, Table S3). The Ka/Ks for GhAt-Ga ranged from 0 to 0.949034416, while for GhDt-Gr Ka/Ks ranged from 0 to 0.838286204. The Ka/Ks of GhAt-GhDt ranged from 0 to 0.523637063, whereas the Ka/Ks value of Ga-Gr ranged from 0 to 0.755930549. In all the pairs, the Ka/Ks value was < 1 which indicated that the gene family was subjected to negative selection. The results suggested that the LHC genes of G. hirsutum derived from G. raimondii and G. arboreum experienced negative selection throughout the process of evolution.

Fig. 7
figure 7

Gene evolution forms of Ka, Ks and Ka/Ks values for homologous LHC gene pairs a G. hirsutum At – G. arboreum b G. hirsutum Dt - G. raimondii c G. hirsutum At – G. hirsutum Dt d G. arboreumG. raimondii

Prediction of subcellular localization for LHC proteins in Gossypium species

The results from the WOLF PSORT ( showed that the LHC proteins were localized in various cell parts including chloroplast, cytoplasm, endoplasmic reticulum, mitochondria, nucleus and vacuole (Fig. 8). Based on the online analysis of the three Gossypium species, the LHC proteins were mainly localized in the chloroplast with 472.5 (72.1%), 251.5 (73.1%), 232 (73.3%) in G. hirsutum, G. arboreum and G. raimondii, respectively (Table S5).

Fig. 8
figure 8

Subcellular localization prediction of three Gossypium species using Wolf PSORT online tool a Heatmap illustration for G. hirsutum b Heatmap illustration for G. arboreum c Heatmap illustration for G. raimondii

RT-qPCR validation of LHC genes under water deficit conditions

Twenty-seven LHC genes expression profiles were carried out in different tissues and varying time intervals under PEG-6000 treatment. All genes showed differential expression patterns in the analyzed tissues (Table S6). Gh_D10G2385, Gh_A13G0222, Gh_A05G0725, Gh_D05G0860, Gh_D07G0661, Gh_D01G1508, Gh_D12G1495, Gh_A07G2182, and Gh_A10G2108 were found to be highly upregulated in the roots, whereas, Gh_A07G2184, Gh_D10G2385, Gh_D05G0860, Gh_D02G1996, Gh_A13G0222, and Gh_A05G0725 showed higher upregulation after 12 h of stress exposure in leaves. Similarly, Gh_A13G0222, Gh_D06G1791, and Gh_A06G1447 genes were upregulated in stem tissues starting from 6 h up to 24 h (Fig. 9).

Fig. 9
figure 9

Differential expression analysis of LHC gene family using RT-qPCR in G. hirsutum under drought stress. Red and green colors indicate high and low levels of expression, respectively. a Heat map showing gene expression in leaf tissue b Heat map showing gene expression in root tissue and c Heat map showing gene expression in stem tissue

Changes in genes expression at different time intervals and in different plant tissues was observed in the results. Most genes were downregulated mainly in leaf tissue followed by stem. Genes like Gh_A10G0361, Gh_D10G0369, Gh_A03G2154, and Gh_D03G0610 were downregulated in the three tissues of cotton at almost all time points. Generally, more genes were upregulated in the root tissues. Expression of gene Gh_A13G0222 was significantly higher in root and stem tissues at 12 h and 24 h after treatment as compared with other time points, whereas in root tissues high expression was observed in almost all time points except at 12 h. Gh_D10G2385, Gh_D05G0860, and Gh_A05G0725 was also upregulated in leaf and root tissues under drought stress. A detailed exploration of these genes will offer efficient information on considerate LHC genes in cotton (Gossypium) and its part in drought stress tolerance. Drought effect comes first at the root zone, and the higher upregulation of various genes in the root tissues is in line with earlier results in which most of the LEA genes were upregulated in the root tissues in relative to leaf and stem tissues during drought stress situation (Magwanga et al. 2018).


Drought is one of the key abiotic stresses that affect crop production worldwide. It also harshly affects the physiology and growth of many crops (Joshi et al. 2016). It is the main risk for a significant loss of cotton yield due to the ever-increasing shortage of water around the world (Hou et al. 2018). Drought stress damages photosynthetic pigments, which typically start by affecting stomata at medium drought severity, causing metabolic and structural changes during harsh drought conditions. Photosynthesis is one of the greatest vital photochemical reactions that occurs in plants. Sunlight is transformed into chemical energy and is employed to change carbon dioxide, water, and minerals into oxygen and energy-rich organic composites then recycled as energy basis by heterotrophs (Gururani et al. 2015). It is the outcome of many steps and multipart developments that employs numerous biological pathways similar to photosynthetic electron transport system (PETs) makes sunlight to transform into ATP and NADPH; in addition, by the Calvin-Benson cycle, CO2 is fixed into carbohydrates, as well as the assimilation, transport, and intake of photo assimilates as the organic products of photosynthesis (Eberhard et al. 2008; Foyer et al. 2012).

Forming disorder of all photosynthesis mechanisms has the primary impact of abiotic stress on the activity of photosynthesis (Nouri et al. 2015). Photosynthetic reactions of mature crops and small seedlings to drought-stress are mainly diverse. In mature crops, efficient photosynthetic complexes are previously shaped and water-stress brings the creation of reactive oxygen species due to surplus light absorption, which pressures the photosynthetic apparatus. Though, in young seedlings under water stress, there is the likelihood to down-regulate chlorophyll biosynthesis and slim down the production and gathering of light-harvesting complexes of PSI and PSII, and to acclimatize crops not to suck up surplus light, which is damaging (Dalal and Tripathy 2018). Chloroplast is the main research area in the field of biology because it is the site for photosynthesis. But it is also a very sensitive structure to biotic and abiotic stresses and indicates the real status in crops response to stress (Li et al. 2020; Liu et al. 2013).

Light-harvesting chlorophyll a/b-binding (LHC) proteins contain a plant-specific superfamily comprised of photosynthesis and stress responses. Identifying genes of this family would help in studying the function and role of these genes in different crop species (Qin et al. 2017; Zou et al. 2020). However, not enough information is available in the cotton crop for this family. Previous studies in crops suggested that there was an important link between photosynthesis and final yield. Light-harvesting complex II (LHCII) is a central component of the photosynthesis, with the fundamental parts in light harvest and acclimation to changing light (Longoni et al. 2015; Qin et al. 2017).

In our results, many genes were upregulated in the root tissue. Gh_A13G0222 was upregulated in all tissue samples while Gh_D10G2385, Gh_D05G0860, and Gh_A05G0725 were upregulated in leaf and root tissues under drought stress. A study from tea plants showed that two genes, CsCP1 and CsCP2, were found to affect phosphorylation/ dephosphorylation and GTP in the physiological regulation of PSII. The regulation of LHC protein stages allows chloroplasts to answer amenably and quickly to abiotic stresses (Li et al. 2020). It was observed in papaya plants after treatment with mannitol for drought stress for 10 days, three genes were found to be upregulated (CpELIP, CpLhcb7, and CpPsbS). After 15 days, five genes were found to be upregulated (CpELIP, CpSEP2, CpOHP2, CpLhcb7, and CpPsbS), and after 20 days, five genes were found to be highly upregulated (CpELIP, CpSEP2, CpOHP2, CpLhcb7, and CpPsbS) (Zou et al. 2020). The evolution of LHC genes in Gossypium species indicated that the distributions of Ka, Ks, and Ka/Ks were similar among homologous pairs. The Ka/Ks of GhAt-Ga ranged from 0 to 0.949034416, while GhDt-Gr ranged from 0 to 0.838286204. The Ka/Ks of GhAt-GhDt ranged from 0 to 0.523637063, whereas the Ka/Ks value of Ga-Gr was 0–0.755930549. The result suggested that the LHC of G. hirsutum genes derived from G. raimondii and G. arboreum experienced negative selection instructions throughout the evolution. In harmony with this finding, the Ka/Ks value of cassava light-harvesting chlorophyll a/b-binding (LHC) genes ranges from 0.0010–0.2507 (Zou and Yang 2019).

LHCB family members positively regulates crops abiotic stress tolerance by stomatal closure to ABA signaling starting from germination to final growth (Liu et al. 2013; Xu et al. 2012). It is well identified that ABA persuades stomatal closure in water shortage conditions, which hinders photosynthesis. Here, the genetic evidence provides that members of the LHCB family are certainly elaborated in guard cell signalling in response to ABA so, LHCB members have been found as new actors in ABA signalling in stomatal movement (Xu et al. 2012). WRKY transcription factor helps to boost the expression of LHCB by suppressing the WRKY repressors under stress conditions (Liu et al. 2013). Functional genomics trials will be beneficial for the validation of LHCB genes function both at molecular and genetics levels thus, making LHC family useful for cotton improvement.


The aim of this study is to investigate G. hirsutum LHC genes and their potential to drought stress tolerance. On the basis of family analysis, a hundred and nine proteins encrypted by the LHC genes were found in the cotton genome, with 55, 27, and 27 genes found to be distributed in Gossypium hirsutum, G. arboreum, and G. raimondii, respectively. The majority of LHC genes showed with high exon-intron connections. Collinearity analysis and chromosomal mapping showed that LHC genes were dispersed on chromosomes of three Gossypium species, with most genes clustering on the upper and lower arm of chromosomes. RT-qPCR analysis reveals the upregulation of more genes in roots followed by stem and leaf tissues. Gh A13G0222, Gh D05G0860, and Gh D10G2385 were found to be candidate genes linked to drought stress tolerance in cotton after being upregulated in post treatments examined in the current study. Therefore, we recommend a detailed investigation of candidate genes at both molecular and genetic levels to elucidate the underlying mechanisms in response to drought stress tolerance in cotton.

Availability of data and materials

All the related data and files are all presented including the primers sequences used in the genes expression profiling.



Abscisic acid


Gene ontology


Light-Harvesting Chlorophyll a/b binding


Non-synonymous substitution rate


Synonymous substitution rate


Cotton Functional Genomics Database


Real time qualitative polymerase chain reaction


Download references


We honestly appreciate the provision given to us by our lab throughout the time of this research.


This research was funded by the National Natural Science Foundation of China, grant number 31621005, 31530053, 31671745, and The National Key R&D Program of China ((2021YFE0101200), PSF/CRP/18thProtocol (07). 

Author information

Authors and Affiliations



Mehari TG, Xu YC, Magwanga RO, and Umer MJ conducted the experiment and wrote the manuscript. Cai XY, Kirungu JN, Hou YQ, Wang YH, and Yu SX assisted in data collection. Wang K, Zhou ZL and Liu F revised the manuscript. All authors reread and agreed on the final manuscript.

Corresponding authors

Correspondence to ZHOU Zhongli or LIU Fang.

Ethics declarations

Ethics approval and consent to participate

No ethical nor consent to contribute in this research was sought, this not application in this research work.

Consent for publication

Not applicable.

Competing interests

The authors declared that they have no competing interests.

Supplementary Information

Additional file 1: Table S1.

Physiochemical properties of proteins in G. hirsutum, G. arboreum and G. raimondii Species.

Additional file 2: Table S2.

List of primer details of RT-qPCR for LHC genes; Primers was designed by NCBI primer blast.

Additional file 3: Table S3.

List of LHC genes in G. hirsutum, G. arboreum, and G. raimondii, respectively.

Additional file 4: Table S4.

Ka, Ks, Ka/Ks values of LHC genes.

Additional file 5: Table S5.

Subcellular localization prediction of LHC proteins in the three Gossypium species.

Additional file 6: Table S6.

T-test for the RT-qPCR analysis of LHC genes using SAS software.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

MEHARI, T.G., XU, Y., MAGWANGA, R.O. et al. Genome wide identification and characterization of light-harvesting Chloro a/b binding (LHC) genes reveals their potential role in enhancing drought tolerance in Gossypium hirsutum. J Cotton Res 4, 15 (2021).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: