Skip to main content

Genome-wide identification and expression analysis of DNA demethylase family in cotton



DNA methylation is an important epigenetic factor that maintains and regulates gene expression. The mode and level of DNA methylation depend on the roles of DNA methyltransferase and demethylase, while DNA demethylase plays a key role in the process of DNA demethylation. The results showed that the plant’s DNA demethylase all contained conserved DNA glycosidase domain. This study identified the cotton DNA demethylase gene family and analyzed it using bioinformatics methods to lay the foundation for further study of cotton demethylase gene function.


This study used genomic information from diploid Gossypium raimondii JGI (D), Gossypium arboreum L. CRI (A), Gossypium hirsutum L. JGI (AD1) and Gossypium barbadebse L. NAU (AD2) to Arabidopsis thaliana. Using DNA demethylase genes sequence of Arabidopsis as reference, 25 DNA demethylase genes were identified in cotton by BLAST analysis. There are 4 genes in the genome D, 5 genes in the genome A, 10 genes in the genome AD1, and 6 genes in the genome AD2. The gene structure and evolution were analyzed by bioinformatics, and the expression patterns of DNA demethylase gene family in Gossypium hirsutum L. were analyzed. From the phylogenetic tree analysis, the DNA demethylase gene family of cotton can be divided into four subfamilies: REPRESSOR of SILENCING 1 (ROS1), DEMETER (DME), DEMETER-LIKE 2 (DML2), and DEMETER-LIKE3 (DML3). The sequence similarity of DNA demethylase genes in the same species was higher, and the genetic relationship was also relatively close. Analysis of the gene structure revealed that the DNA demethylase gene family members of the four subfamilies varied greatly. Among them, the number of introns of ROS1 and DME subfamily was larger, and the gene structure was more complex. For the analysis of the conserved domain, it was known that the DNA demethylase family gene member has an endonuclease III (ENDO3c) domain.


The genes of the DNA demethylase family are distributed differently in different cotton species, and the gene structure is very different. High expression of ROS1 genes in cotton were under abiotic stress. The expression levels of ROS1 genes were higher during the formation of cotton ovule. The transcription levels of ROS1 family genes were higher during cotton fiber development.


DNA methylation is an epigenetic modification widely found in bacteria, plants and animals (Chen et al. 2015; Manning et al. 2006; Zhong et al. 2013). It involves gene silencing, transposon suppression, genomic imprinting, X chromosome inactivation, cell differentiation, embryo development and other growth and development processes (Fu et al. 2014; Xie et al. 2013; Macdonald 2012; Bala et al. 2013). DNA methylation is a necessary set up for the normal growth and development of organisms. DNA methylation can affect the stability of the genome, regulate gene expression, and maintain growth and development (Wang and Xu 2014; Zhang et al. 2018; Cokus et al. 2008). It has been shown that DNA methylation is a dynamic process that can be regulated according to different development periods or environment conditions (Bartels et al. 2018). DNA methylation is regulated by different pathways to establish methylation and reverse methylation (Ja and Se 2010). Most of the DNA methylation occur on the fifth carbon atom (C5) of the cytosine in the symmetrical CG site, but also exsits in the CHG and CHH (H = A, C or T) sequences in plants (Stroud et al. 2014). It has now been found that there are two methods of DNA methylation in the plant that maintain methylation and de novo methylation (Jullien et al. 2012). There are four types of C5-MTases in plants, including Methyltransferase (MET) family, the Chromomethylase (CMT) family, the Domains Rearranged Methyltransferase (DRM) family and Dnmt2 (Wang et al. 2016; Pavlopoulou and Kossida 2007).

There are two possibilities for the deletion of methylated cytosine, one caused by improper manipulation during DNA replication and the other by DNA demethylase activity. DNA demethylase contains a bifunctional DNA glycosidase domain (Tomkova et al. 2018). The DNA glycosidase domain not only directly cleave methylcytosine, but also cleave the DNA backbone at the abasic site. Then DNA polymerase and DNA ligase fill the base vacancy with unmodifided cytosine (Mccullough et al. 1989). There are four main types of DNA demethylases, ROS1, DME, DML2, and DML3. DME is unique in dicotyledons and is involved in embryo and endosperm development, and is essential for demethylation of the entire genome and transposon reactivation (Frost et al. 2018). DML2, DML3 and ROS1 are expressed in vegetative cells (Jon et al. 2007). DML2 and DML3 are capable of removing unwanted methylation at specific sites (Zhu et al. 2007). ROS1 can inhibit the methylation at gene promoters (Gong et al. 2002). ROS1b can reactivate Tos17 by remove DNA methylation (La et al. 2011). ROS1-mediated DNA demethylation can cause deconstruction of 5S rDNA chromatin, causing plants to respond to biotic and abiotic stresses, and also prevent RNA-directed DNA methylation (Movahedi et al. 2018). DNA demethylase plays an important part in removing DNA methylation . There are related studies in A. thaliana and rice (Penterman et al. 2007; Choi et al. 2004; Zemach et al. 2010). As an important fiber and oil crop, cotton plays a pivotal role in China’s national economy (Chen et al. 2017). It is an critical problem in cotton planting today on how to improve the quality of cotton fiber and the resistance of plants to different stresses. The results of DNA methylation research are important for studying stress resistance mechanisms and improving cotton stress resistance.

Materials and methods

Identification of cotton DNA demethylase family members

Using the Arabidopsis DNA demethylase protein sequence (AT1G05900.2, AT2G36490, AT2G31450.1, AT3G10010.1, AT3G47830.1, AT4G34060.1, AT5G04560.2) as a motif, Blastp homologous alignments were performed in CottonFGD ( with P < 0.001 and similarity > 40% with the order to the identify the candidate protein and obtains the DNA demethylase family member gene locus name. Using gene locus name of candidate DNA demethylase respective cds sequence, amino acid sequence, genome sequence of identified candidate DNA demethylase were downloaded from Gossypium arboreum L., CRI G. raimondii, JGI G. hirsutum L., JGI G. barbadense L., NAU database using respective gene mapping ID. The protein sequences of the candidate genes were analyzed by using SMART ( to ensure that each candidate gene contained a DNA glycosidase domain. Subcellular localization prediction was performed on the Cello website ( Protein analysis was performed by using ProtParam ( to obtain isoelectric points.

Cotton DNA demethylase family evolution analysis

The Arabidopsis thaliana amino acid sequence was used as a reference, and the E < e-5 was used as a threshold to obtain the hairy tree in the Phytozome v12.1 ( database of Homologous sequence. Multi-sequence alignment (Clustal W) of DNA demethylase genomic sequence of G. raimondii, G. arboretum L., G. hirsutum L., and G. barbadebse L., to Arabidopsis were conducted by using MEGA7.0 software, and adopted the adjacency method (Neighbor-Joining, NJ) to create a tree with Bootstrap at 1 000. The same method was also used to construct the phylogenetic tree of DNA demethylase protein family of G. raimondii, G. arboreum L., G. hirsutum L., G. barbadebse L., Populus trichocarpaand, A. thaliana.

Prediction of the basic structure of DNA demethylase gene family

The basic physicochemical properties of the amino acid sequence of the family protein sequence were analyzed by the online software ProParam ( in ExPASy. The gene structure map was drawn by GSDS2.0 online ( Motif analysis was performed by the online tool MEME ( The physical map of the chromosome was drawn by the software Mapinspect.

Analysis of expression patterns of cotton DNA demethylase gene under stress conditions

The FPKMs (Fragments per kilobase million) of the DNA demethylase gene in G. hirsutum L., under cold, heat, drought and salt stress conditions, ovule development formation, and fiber development stage were obtained from the cottonFGD database ( G. hirsutum L., (AD1) Genome - Texas Interim release UTX-JGI v1.1 genome assembly is made available through a “Reserved Analyses” restriction. The FRKM value can reflect the level of gene expression and a heat-map of gene expression was obtained using the tool HemI software.

Results and analysis

Whole genome identification of cotton DNA demethylase family members

Twenty-five DNA demethylases were identified from the whole genome of cotton by multiple sequence alignment. There were four DNA demethylases genes in genome D and five DNA demethylases genes in genome A, respectively. According to their position on chromosome, they were named GaDM1-GaDM4 and GrDM1-GrDM5, respectively. Ten genes of DNA demethylases were identified in genome AD1, they were named GhDM1-GhDM10, and six DNA demethylases genes were identified in genome AD2 named GbDM1GbDM6. Most of the DNA demethylases in the 4 cotton species were located on the chromosomes, and only GbDM5 is not on any chromosome. The DNA demethylase protein in cotton consists of 2661 949 amino acids, of which GhDM8 is up to 1 949 amino acids and GbDM7 contains only 266 amino acids. The isoelectric point (pI) of DNA demethylase gene in cotton ranged from 6.10 to 9.48. The isoelectric point of GaDM2 was lowest at 6.10, and that of GhDM9 was highest at 9.48. Subcellular localization predictions showed that most of the DNA demethylase genes in cotton were located at outer membrane. Only a few genes were located in cytoplasm and periplasm (Table 1).

Table 1 Basic characteristic of DNA MT genes in cotton genome

Multi-sequence alignment and evolution analysis

To understand the evolutionary relationship of DNA demethylases in genome A, D, AD1 and genome AD2, multiple sequence alignments were performed on 25 DNA demethylase family members and a phylogenetic tree was constructed (Fig. 1a). The DNA demethylases in cotton were divided into four subfamilies, ROS1, DME, DML2, and DML3. The ROS1 subfamily had 8 members. i.e., genome D, A, AD1, and genome AD2 with 2, 1, 4, and 1, respectively; DME had 6 members, and genome D, A, AD1, and genome AD2 had 1, 1, 2 and 2, respectively. DML2 has 6 members, genome D, A, AD1, and genome AD2 had 1, 1, 2 and 2, respectively; DML3 has 5 members, and genome D, A, AD1 and genome AD2 had 1, 1, 2 and 1, respectively.

Fig. 1
figure 1

Evolutionary relationship, gene structure, and protein domain analysis of cotton DNA demethylase gene family. a family phylogenetic tree analysis; b gene structure; c protein domain

Gene structure analysis and protein domain analysis of cotton DNA demethylase family genes

Gene structure analysis is an important strategy to study genetic evolution. Analysis of the number of introns and exons in the DNA demethylase family members in D, A, AD1 and AD2 (Fig. 1b) showed that the number of DNA demethylase gene exons in cotton differed greatly. Among them, GrDM2, GaDM5, GhDM5 and GhDM10 have only 4 exons; GhDMT6 has 21 exons.

The motif analysis of 25 DNA demethylases in cotton was shown in Fig. 1c. The cotton DNA demethylase gene contains 12 motif structures, of which Motif1, 2 5, 11 together constitute a conserved ENDO3c glycosidase domain. Among different cotton genomes, the ROS1 and DME families were identical, containing Motif 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, and 12 conserved sequences, which constitute ENDO3c, FES, Pfam: Perm-CXXC, Pfam: RRM_DME domain. The DML2 family contains six Motif structures, namely Motif 1, 2, 5, 6, 8, and 11, which constitute the ENDO3c and HhH1 domains; the DML3 family contains six Motif structures, namely Motif 1, 2, 4, 5, 7, and 11. It constitutes the ENDO3c, HhH1, and FES domains. There were large differences in protein structure between different subfamilies, presumably due to the long-term evolution of genes.

Distribution of cotton DNA demethylase gene family members on chromosomes

The distribution of genes on chromosomes provides an important basis for studying the evolution and function of gene families. Combining the chromosomal information of the 4 genomes of cotton and the position of each DNA demethylase gene on the chromosome, the distribution map of the cotton demethylase gene on the chromosome was obtained (Fig. 2). The G. arboretum L., genome contains five genes, which were distributed on chromosomes 1, 4, 9, 10, and 12, respectively (Ga). The G. raimondii genome contains 4 GaDM genes, which were distributed on chromosomes 2, 8, 9, and 11, respectively (Gr). There were 10 GhDM genes in the G. hirsutum L. genome, and these 10 genes were evenly distributed on the group A and group D chromosomes. One of the genes was distributed on chromosomes 1, 4, 9, 10, and 12 in the genome A and also on same chromesomes in genome D (Gh, respectively). There were 6 GbDM genes in the Gossypium barbadebse L. genome and these 6 genes were unevenly distributed on the genome A and the genome D chromosomes. One of the genes was distributed on chromosomes 1, 10, and 12 in the genome A. Two genes were distributed on chromosomes 1 and 12 in the genome D, respectively, and 1 gene was mapped to scaffold_1890 (Gb).

Fig. 2
figure 2

Distribution of cotton DNA demethylase family genes on chromosomes. Ga:Gossypium arboretum L.; Gb:Gossypium barbadebse L.; Gh:Gossypium hirsutum L.; Gr:Gossypium raimondii

Evolutionary relationship between cotton DNA demethylase family and other plant DNA demethylase family

Construction of phylogenetic trees revealed the homologous and evolutionary relationships of DNA demethylase genes from different species. The cotton DNA demethylase family members were aligned with the amino acid sequences of DNA demethylase members in A. thaliana and P. trichocarpa, and then a phylogenetic tree was constructed by MEGA 7.0 (Fig. 3). The results showed that the DNA demethylases in the cotton genomes had smaller evolution distance  compared with other crops. The DNA demethylase in cotton is closely related to the P. trichocarpa on each branch, indicating that they have similar functions. A. thaliana and P. trichocarpa have a different type of gene due to evolutionary differences between species.

Fig. 3
figure 3

Phylogenetic analysis of DNA demethylase gene family members in cotton and other species. The species used to construct the phylogenetic tree are: Gossypium raimondii (GrDM); Gossypium arboretum L.(GaDM); Gossypium hirsutum L.(GhDM); Gossypium barbadebse L.(GbDM); Arabidopsis thaliana; Populus trichocarpa

Expression of DNA methyltransferase gene in stresses and ovule formation as well as fiber formation in cotton

The FPKM of the terpene cotton TM-1 DNA demethylase gene was downloaded from the CottonFGD database to construct an expression map of cotton DNA demethylase gene under abiotic stresses conditions, as well as different developmental stages during ovule formation and fiber development. The results showed that three genes GhDM2, GhDM4 and GhDM7 were highly expressed under cold, heat, drought and salt stress. Under different kinds of stress, the expression levels of DNA demethylase genes were different. The expression of GhDM2 gene was down-regulated under cold stress, but it was up-regulated under drought, heat and salt stresses. The gene expression of ROS1 and DML3 families were up-regulated, and the expression levels of DME and DML2 family genes were down-regulated when cotton was under stress. The same family gene has different expression levels under the same stress. The expression levels of GhDM2 and GhDM7 genes were up-regulated, but the expression levels of GhDM3 and GhDM8 genes were down-regulated than those of ROS1 family genes under heat stress (Fig. 4a).

Fig. 4
figure 4

Expression of cotton DNA demethylase gene. a abiotic stress; b ovule development formation process; c fiber development stage

During the ovule formation process, the expression of GhDM2, GhDM3, GhDM7 and GhDM8 genes in ROS1 family were up-regulated, while the expression of GhDM1 and GhDM6 genes in DME family were down-regulated. The GhDM6 gene was not expressed, while other demethylase genes were up-regulated in 3 days before-anthesis and in anthesis. GhDM2 gene was up-regulated in 3 days before-anthesis and in 20 days post-anthesis (Fig. 4b).

The expression level of DNA demethylase gene was relatively low during fiber development, GhDM5 and GhDM6 were not expressed. At the 25th day of fiber formation, the GhDM2 gene expression level was higher, but the expression of GhDM7 gene in the ROS1 family could not be detected. At the 15th day of fiber formation, the expression of GhDM7 gene was higher, and the expression of GhDM2, GhDM3, and GhDM8 genes in the ROS1 family could not be detected either (Fig. 4c).


With the expanding information of cotton genome, we used the comparative genomics research method to determine the demethylase gene in cotton, conducted sequence analysis, phylogenetic analysis, and examined expression pattern under different conditions. DNA methylation is not only involved in the regulation of gene expression but also in maintaining genome stability (Dai et al. 2014). DNA demethylase can remove methylation and regulate gene expression, which is closely related to stress resistance (Colot and Rossignol 1999). With the completion of the cotton genome sequencing work, it is convenient to study the cotton demethylase gene from the whole genome. DNA methylation is an important epigenetic process that affects many biological processes, (Dennis 2000). DNA demethylation is a complex process which mechanism is unclear. DNA demethylase plays an important role in epigenetics. At present, it is generally believed that there are five mechanisms for DNA demethylation: base excision repair mechanism relying on DNA demethylase, base excision repair, mismatch excision repair of methyl cytidine deamination coupling G/T, demethylation by hydrolysis and oxidative demethylation (Cao et al. 2012). DNA demethylase is essential in all mechanisms.

In this study, we investigated the structure, evolution, collinearity and expression of DNA demethylase genes in cotton. The results showed that DNA demethylase contains four conserved motif structures, which is consistent with the study in angiosperms (Liu et al. 2014). There are four types of DNA demethylase in cotton that are identical to Arabidopsis. The DNA demethylase gene was evenly distributed in four cotton species and evolved consistently. The ROS1 gene subfamily has been replicated several times to produce new functional and sub-functionalization of genes; this provides clues for further study of the role and mechanism of different DNA demethylase genes. Evolutionary analysis revealed that DNA demethylase genes differed greatly among different species or different families of the same species.

Plants respond rapidly to abiotic stress through DNA methylation machinery. The DNA demethylase gene plays an important role in regulating gene expression. The results showed that the DNA demethylase gene responded to cold, heat, drought and salt, abiotic stresses (Fig. 4a). The expression level of DML-like demethylase gene in A. thaliana was increased during stress, and the expression levels of ROS1 and DML3 demethylase family genes in cotton were higher, but DME and DML2 demethylase family were lower (Tzung-Fu et al. 2009). DNA demethylase indirectly responds to stresses by regulating DNA methylation levels (Sanchez and Paszkowski 2014) The DME gene in Arabidopsis is preferential expressed in the central and companion cells of the female gametophyte, which affects the development of embryo and endosperm; before the expression of GhDM6 gene in DME family is low during cotton ovule formation (Choi et al. 2002). The function of the DNA demethylase gene changed with evolution (Agius et al. 2006).

DNA methylation is essential in regulating plant development and response to environmental stimulis, but how the DNA methylase and demethylase participate in various responses is a complex process and the mechanism is still unclear. The differential expression analysis of the demethylase gene showed that under different kinds of abiotic stress, the expression level of the demethylase gene changed greatly, and some key genes may be demethylated. The response is critical, indicating that DNA methylation is most likely involved in the effects of the environment on cotton growth and development. DNA demethylase gene expression was higher during cotton ovule formation, indicating that DNA methylation may have a regulatory role in cotton ovule formation. Therefore, this study provided some clues for the roles of DNA methylation in cotton in response to stress, as well as its developmental role in ovule formation and fiber development, and provided a base for further exploration of epigenetic regulation mechanisms during cotton development.


DNA demethylase gene family plays a significant role in plant growth and development. The high expression of cotton DNA demethylase gene in abiotic stress, ovule formation and fiber development stage indicates that the demethylase family plays an important role in cotton growth and development. The results of this study laid the foundation for excavating functional genes and further studying the stress resistance mechanism of cotton.

Availability of data and materials

All data generated or analyzed in this study included in published article and additional files.




-CH3 :

Methyl group




DNA methyltransferase2


Domains rearranged methyltransferase


Fragments per kilobase million


Gossypium arboreum L.


Gossypium barbadebse L.


Gossypium hirsutum L.


Gossypium raimondii




Isoelectric Point


Download references


Not applicable.


This study was funded by the National Key Research and Development Program of China (2018YFD0100401).

Author information

Authors and Affiliations



Yang XM and Ye WW conceived and designed the experiments; Yang XM, Wang XG and Lu XK performed the experiments and collected the data; Chen XG, Wang S, Wang DL and Wang JJ obtained funding and Guo LX, Wang XL and Chen C contributed reagents/materials/analysis tools; Yang XM and Ye WW revised the paper. All authors read and approved the final manuscript.

Authors’ information

Not applicable.

Corresponding author

Correspondence to YE Wuwei.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

YANG, X., LU, X., CHEN, X. et al. Genome-wide identification and expression analysis of DNA demethylase family in cotton. J Cotton Res 2, 16 (2019).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: