Skip to main content

A genome-wide association study of lateral root number for Asian cotton (Gossypium arboreum L.)



The lateral root is one of the most important organs that constitute the root architecture system in plants. It can directly affect the contact area between plants and soil and plays an important role in plant structural support and nutrient absorption. Optimizing root architecture systems can greatly increase crop yields. This study was designed to identify the molecular markers and candidate genes associated with lateral root development in cotton and to evaluate correlations with yield and disease traits.


The number of lateral roots for 14-day old seedlings was recorded for 215 Gossypium arboreum accessions. A correlation analysis showed that the number of lateral roots positively correlates with the sympodial branch node and seed index traits, but negatively correlates with lint percentage. A Genome-wide association study (GWAS) identified 18 significant SNPs with 19 candidate genes associated with the lateral root number. Expression analysis identified three genes (FLA12, WRKY29, and RBOHA) associated with lateral root development.


GWAS analysis identified key SNPs and candidate genes for lateral root number, and genes of FLA12, WRKY29, and RBOHA may play a pivotal role in lateral root development in Asian cotton.


The lateral root, one of the most important organs in root system architecture, plays a pivotal role in water and nutrient acquisition that can influence crop productivity. In rice (Oryza sativa L.), the lateral root defective mutant osiaa11 and the root hairs defective mutant osrhl1 were used to demonstrate that lateral roots and not root hairs played a pivotal role in high Mn and Cd uptake (Yu et al. 2021). In maize (Zea mays), Jia et al. (2018) proved that higher lateral root branch density has greater phosphorus acquisition from low phosphorus soil.

Lateral roots are initiated from the root pericycle cells by asymmetric cell division (Casero et al. 1993). Research progresses on quantitative trait loci (QTLs) of lateral root development have been reported in crops like rice (Oryza sativa L.) (Niones et al. 2015), rap (Brassica napus L.) (Ying et al. 2016), maize (Zea mays L.) (Wang et al. 2019), and soybean (Glycine max (Linn.) Merr.) (Liang et al. 2017), but little knowledge was discovered in cotton. The lateral root development is a complex process that is mainly modulated by endogenous hormones. Auxin transport plays a pivotal role in root branching with different auxin “signal modules” regulating the various stages of lateral root development (Marhavy et al. 2014; Talboys et al. 2014). A recent study found that the AUXIN RESPONSE FACTOR 7 and its inhibitor IAA18/POTENT formed a negative regulatory loop circuit to drive the root clock in Arabidopsis, which affected the spacing of lateral roots along the primary root (Perianez-Rodriguez et al. 2021). Lv et al. (2021) overexpressed the transcription factor ERF13 in Arabidopsis, resulting in erf13 mutants with greater lateral root emergence, and showed that MPK14-mediated auxin signals controlled lateral root development through ERF13-regulated very-long-chain fatty acid biosynthesis. The auxin signal modules IAA14/SLR (SOLITARY ROOT)-ARF7-ARF19 (Fukaki et al. 2010), IAA12/BDL (BODENLOS)-ARF5 (Ive et al. 2014), PIN proteins (Lewis et al. 2011; Inahashi et al. 2018) have also been reported to be involved in the lateral root initiation and development. Other plant hormones, such as gibberellin (GA) (Farquharson 2010; Hetherington et al. 2021), abscisic acid (ABA) (Duan et al. 2013; Zhao et al. 2014), ethylene (Ivanchenko et al. 2008; Prasad et al. 2010), and jasmonate (JA) (Cai et al. 2015; Kuo et al. 2020) were also associated with the lateral root formation. In cotton (Gossypium hirsutum), the plant growth regulator mepiquat chloride promoted lateral root formation by modulating GA, ABA, and auxin homeostasis (Wu et al. 2019).

Although plant hormones play an important role in lateral root development, the mechanisms of lateral root initiation in cotton have not been elucidated. Genome-wide association study (GWAS) is a typical method to evaluate the association between millions of single nucleotide polymorphisms (SNPs) markers and a phenotype of interest in large populations for animals or plants. In cotton, many important agronomic traits like yield (Du et al. 2018), fiber quality (Fang et al. 2017; He et al. 2021) and disease-resistant related traits (Li et al. 2017) have been successfully dissected their underpinnings by GWAS. Gossypium arboreum (Asian cotton) is a diploid cotton species that can be used as a model to study root architecture development. DNA sequencing data is available for 215 G. arboreum accessions (Du et al. 2018). A GWAS analysis was conducted in the present study to detect the significant loci for seedling lateral root number. Results of this study could provide targets for cotton breeders to optimize the cotton root system structure to enhance yield potential.

Materials and methods

Determination of lateral root initiation rate

Lateral root development was initially evaluated in the Asian cotton standard line Shixiya1 (Li et al. 2014) to identify the period with the highest frequency of lateral root initiation. Seeds of Shixiya1 were surface sterilized with 15% H2O2 for ~ 30 min, then rinsed five times with steriled water. Seeds were germinated on moisten filter papers and then transplanted into hydroponic containers using a half-strength of Hoagland nutrient solution. The lateral roots were manually counted 4, 8, 12, 20 days after planting for each seedling. Five replicates were set, and each replicate included six plants. The average daily growth of lateral roots was calculated by increased lateral root number/growth days. The differences among growth periods were analyzed by t-test.

Lateral root evaluation for accessions

This evaluation used 215 G. arboreum accessions that had DNA sequencing and phenotypic data for 11 yield and disease-related traits (Du et al. 2018). Seeds for the accessions were provided by the National Medium-term Gene Bank of Cotton in China (Anyang, Henan). Thirty-six seeds with full and uniform size for each accession were first selected for sowing. Then, all accessions were evaluated using the seed germination bag method according to the manual instructions (Zhao et al. 2021). Each accession included three independent replicates (one replicate per bag), and each bag contained 1012 plants (Shixiya1 included as a control). The experiment was conducted in a greenhouse (Temperature: 28℃; photoperiod: light/night = 16 h/8 h; relative humidity: 60% ~ 65%). The number of lateral roots for each plant was counted 14 days after planting as described for the Shixiya1 evaluation. The mean number of lateral roots and their replicates were calculated and were used for further analysis (Additional file 1: Table S1). A basic description analysis included the range, coefficient of variation, Skewness/Kurtosis was performed by GraphPad Prism 8 (Francis 1995).

Correlation analysis of lateral root development with phenotypic traits

The lateral root data for the 215 accessions were correlated with 11 yield and disease-related traits. The 11 yield and disease-related traits were (1) BW_AY (Boll weight in Anyang (Henan)), (2) BW_SY (Boll weight in Sanya (Hainan)), (3) BW_AKS (Boll weight in Aksu (Xinjiang)), (4) SBN_AY (Sympodial branch node in Anyang (Henan)), (5) SBN_AKS (Sympodial branch node in Aksu (Xinjiang)), (6) FWDI_AY (Fusarium wilt disease index in Anyang (Henan)), (7) LP_AY (Lint percentage in Anyang (Henan)), (8) LP_SY (Lint percentage in Sanya (Hainan)), (9) LP_AKS (Lint percentage in Aksu (Xinjiang)), (10) SI_AY (Seed index in Anyang (Henan)), (11) VWDI_AY (Verticillium wilt disease index in Anyang (Henan)). A pheatmap package method was used to plot the picture of the correlation matrix in R language (R Core Team 2012).

GWAS for lateral root development

The genotype data were derived from Du et al. (2018), and DNA sequences of the same 215 accessions were used for the lateral root number in this study. The Burrows-Wheeler Aligner program (BWA, ver. 0.7.10) (Li and Durbin 2009) and GATK toolkit (ver. 3.2–2) (Mckenna et al. 2010) were used to perform the reads mapping and SNP calling with the default parameters, respectively. Software ANNOVAR tool (Kai et al. 2010) was used to annotate the identified SNPs based on the newly updated G. arboreum genome (Yang et al. 2019) annotation information. After filtering, a total of 1 425 003 high-quality SNP markers (MAF (Minor allele frequency for the genetic variant of interest) > 0.05, missing rate per site < 10%) were selected to perform the GWAS for lateral root trait with the EMMAX (Efficient mixed-model association expedited) model (Kang et al. 2010). The SNP density plot was drawn by R language with “CMplot” package (Turner 2018). The significance for LRN was first set as −log10(P) > 6.15 (1/total SNPs used, –log10(P) = 6.15) (Li et al. 2017), but no effective loci were identified in all the 13 chromosome regions with this threshold. So we manually adjusted their threshold to P < 2.81 × 10–6 (4/1 425 003 used, and − log10(P) = 5.55), and those SNPs with − log10(P) > 5.55 were defined as significant SNPs. The candidate genes were identified by the physical locations of the significant SNP and the related stronger LD (linkage disequilibrium) block regions.

LD block and haplotype analysis for lateral root development

The confidence intervals for the lateral root data were identified by IGV_2.3.91 (Helga et al. 2013) and were plotted by R language with the “qqman” package (Turner 2018). The interval regions were further performed using LD block and Neighbor-Joining analysis in Tassel software (Bradbury et al. 2007). The stronger LD block interval regions were further determined using the haplotype analysis method described by Dai et al. (2020). The different haplotypes were roughly estimated by the genotypes of different groups with the color scale method EXCEL 2010 (

Transcriptome analysis

RNA-seq data of G. arboreum cultivar Shixiya1 had been sequenced by Huang et al. (2020) (Accession: PRJNA594268 ID: 594268), and we downloaded the TPM (Transcripts per million) expression data of candidate genes from The root expression value of the seedling stage was utilized in this study. These candidate genes' tissue and organ expression levels were plotted in R language with a pheatmap package.


Lateral root development for Shixiya1

The number of lateral roots observed varied significantly across the four-time periods (Fig. 1). No lateral roots were recorded 4 days after planting, whereas lateral roots were clearly visible 8 days after planting with a mean of 34.20 ± 7.60 recorded. The number of lateral roots continued to rapidly increase with a mean of 68.00 ± 9.98 recorded 12 days after planting. At 20 days after planting, the rate of lateral root initiation had slowed with a mean of 85.80 ± 10.57 observed. The average daily growth of lateral roots was kept stable on 48 d and 812 d, but it rapidly slowed on 1220 d. Using these data, a 14-day time period was selected to evaluate the 215 accessions.

Fig. 1
figure 1

Phenotype of lateral root number at 4, 8, 12, and 20 days after planting for Asian cotton standard line Shixiya1. a Phenotype pictures of lateral root development. b The numbers of lateral roots. c Average daily growth rate of lateral root number (LRN). * and ***: significant at P < 0.05 and P < 0.001, respectively

Lateral root data analysis for 215 accessions

To more conveniently describe the lateral root number, we abbreviate this trait as LRN. Furthermore, LRN_AV, LRN_R1, LRN_R2, LRN_R3 represented the mean and three replicates of LRN, respectively. LRN_AV ranged from 18.86 to 74.05 across the 215 accessions with a mean of 43.21 and a coefficient of variation of 26.81% (Additional file 1: Table S2). Replicates of LRN showed a similar range, mean, and variation with LRN_AV (Fig. 2a; Additional file 1: Table S2). LRN_AV and its three replicates were biased towards the normal distribution due to their frequency distribution and Skewness/Kurtosis values, and this distribution was suitable for GWAS analysis (Fig. 2b; Additional file 1: Table S2). We are curious about the correlation of the lateral root trait with yield and disease resistance-related traits. So, a correlation was performed. The result showed that LRN_AV positively correlated with the sympodial branch node and seed index traits. Notably, a negative correlation with the lint percentage was observed (Fig. 2c). However, LRN_AV showed a weak correlation with the other yield traits like boll weight and disease-related (VWDI) traits.

Fig. 2
figure 2

Frequency and correlation analysis of lateral root number for 215 Gossypium arboreum accessions. a The number of lateral roots observed in 215 G. arboreum accessions. b Frequency of the lateral root number in 215 G. arboreum accessions. c Correlation analysis of LRN_AV with 11 yield and disease resistance-related traits

GWAS for the number of lateral roots

A total of 1 425 003 high-quality SNP markers were screened out to perform the GWAS in this study, and those SNP densities were shown in Additional file 2: Fig. S1. The quantile–quantile (QQ) plots of LRN_AV, LRN_R1, LRN_R2, and LRN_R3 indicated that the EMMAX could be used to identify signals for lateral root number in Asian cotton (Additional file 3: Fig. S2). The Manhattan plots of LRN_AV and its three replicates showed stronger signals on Chr01, Chr07, Chr13, and Scaffold regions (Chr14). The strongest signal was found on Scaffold regions, and a continuous, more robust signal interval on Chr01 (Chr01:78.2078.90 Mb) was detected in LRN_AV and all the three replicates (Fig. 3).

Fig. 3
figure 3

Manhattan plots of lateral root number for 215 Gossypium arboreum accessions. AV represents the mean value of lateral root number, and R1, R2, R3 represent three replicates respectively. The significance threshold is shown by the blue dashed line

Identification of significant SNPs and candidate genes

Eighteen significant SNPs were detected for the lateral root trait with 7 SNPs located on the Chr01, 5 SNPs located on Scaffold regions, 3 SNPs located on Chr07, and 2 SNPs located on Chr13 (Additional file 1: Table S3). The strongest signal was detected on SNP Chr14_23233813 with a  − log10(P) value of 6.18 and 6.68 for LRN_AV and LRN_R3, respectively. The allelic variant of this site was G/A, and this signal was located on the intergenic region of Ga14G0299 and Ga14G2065. The SNPs for the continuous stronger signal interval on Chr01 included SNP Chr01_78558940, SNP Chr01_78572007, SNP Chr01_78703010, SNP Chr01_78751608, SNP Chr01_78758771, and SNP Chr01_78784026. The LD block of this interval (Chr01:78.20–78.90 Mb) showed a stronger LD block on the region of Chr01:78 284 704–78 804 477, and this LD block was referred to as LD_Chr01 (Fig. 4a, b). Haplotype analysis of the LD_Chr01 region (Fig. 4c) divided the 215 accessions into three haplotypes (HapChr01_1, HapChr01_2, HapChr01_3), including 36, 68, and 111 accessions, respectively. The LRN_AV recorded for HapChr01_2 accessions was significantly higher than HapChr01_1 and HapChr01_3, and the mean number of lateral roots for HapChr01_3 was the lowest (Fig. 4d). The LRN_R1, LRN_R2, and LRN_R3 also showed the same result as LRN_AV (Fig. 4d).

Fig. 4
figure 4

Haplotype analysis of lateral root number on Chr01. a The confidence interval on Chr01. b LD block analysis. c Haplotype classification of lateral root number on Chr01:78 2847 04–788 04 477. d Comparision of lateral root number between different haplotypes

Nineteen candidate genes for LRN were detected for the five chromosomal regions (Additional file 1: Table S4). Seven candidate genes were detected on Chr01, six on the scaffold region, two on Chr05, two on Chr07, and two on Chr13. The LD_Chr01 block was associated with the four candidate genes WRKY29, WDR7, PP1, and At5g41590. The strongest signals on the Scaffold region were associated with HSP23.9 and Ga14G2065. Other associated candidate genes and their significant SNPs were listed in the Additional file 1: Table S4.

TPM expression analysis of candidate genes

Ga01G0035 (S6PDH), Ga01G1695 (WRKY29), Ga01G1696 (WDR7), Ga01G1697 (PP1), Ga01G1699 (RBOHA), Ga07G2390 (MIF2) and Ga14G1993 (FLA12) were expressed higher in the root organization with the highest TPM value (111.49) recorded for Ga14G1993 (FLA12) (Fig. 5, Additional file 1: Table S5).

Fig. 5
figure 5

Heatmap of TPM value for candidate genes in Asian cotton standard line Shixiya1


Cotton is a tap root crop with a deep primary root and wide distribution of lateral roots to form a strong absorption network in the soil. Plant root can directly influence the development of above-ground organs, and different root traits may have a positive/negative correlation with yield-related traits in crops. In upland cotton, the root weight and taproot length appeared a positive correlation with lint yield, whereas the tap root showed a negative correlation with lint yield (Li et al. 2003). So far, there are no reports on the relationship between the lateral root number and yield-related traits in cotton. In this study, we found that the lateral root number negatively correlates with lint percentage and displays a positive correlation with sympodial branch node and seed index traits. Verticillium wilt (VW) disease is caused by the soil-borne fungus Verticillium dahliae, and the plant roots were usually firstly influenced when the VW disease happened. Whether the lateral root number (LRN) has a correlation with VW disease index (VWDI)? Our correlation results showed a weak correlation between the LRN and VWDI. Our findings will deepen people's understanding of the relationship between lateral root development and yield-related traits in cotton.

Research progresses on lateral root development were reported in several crops (Niones et al. 2015; Ying et al. 2016; Wang et al. 2019), but so far, no QTLs/SNPs of lateral root number was discovered in Asian cotton. While, in this study, we identified 18 significant signals in total and detected a stronger LD block on Chr01 for lateral root number in Asian cotton, which will greatly enrich researchers' understanding of molecular markers that control lateral root development in cotton.

In the present study, seven candidate genes were expressed higher in root organs, and three genes (FLA12, WRKY29, and RBOHA) were closely associated with root development. FLA12 encodes for Fasciclin-like arabinogalactan protein 12 (Johnson et al. 2011). Arabinogalactan proteins FLAs are involved in plant growth and development and play an essential role in many plant species (Seifert et al. 2014; Zang et al. 2015). In Arabidopsis, At-FLA4 is necessary for the average growth of wild-type roots under moderately elevated salinity stress. It positively regulates cell wall biosynthesis and root development via modulating ABA signals (Seifert et al. 2014). Huang et al. (2013) conducted a subcellular localization experiment for the cotton gene GhFLA1, and found the GFP fluorescence signals were displayed on the peripheral of transgenic GhFLA1(SP)-eGFPGhFLA1(GPI) Arabidopsis root cell. Further, they showed that this gene was involved in fiber initiation and elongation in upland cotton. In Barley, Stephan et al. (2016) found that the transcription factor WRKY29 was one of the most important candidate genes for root system variation, and a crucial amino acid substitution was detected within the conserved domain of WRKY29 among genotypes carrying significant and minor QTL alleles. WRKY29 was also closely associated with ABA signaling. A recent study found that OsWRKY29 could act as a new ABA signaling repressor for seed dormancy in rice with RNAi mutant lines displaying greater sensitivity after ABA treatment (Zhou et al. 2020). The WRKY29 gene was negatively correlated with the ABA content, and down-regulated in the tolerant rootstocks under osmotic stress (Hezema et al. 2021). RBOHA is a burst oxidase homolog protein, which participates in diverse biological processes in higher plants (Arthikala and Quinto 2018). Arthikala et al. (2018) found that the relative expression levels of PvRbohA were correlated with the activity of its promoter. The transgenic hairy roots of PvRbohA transcript silencing showed a significantly altered lateral root phenotype, which indicated that RbohA was involved in lateral root initiation, emergence, and development for P. Vulgaris. In rice, Groom et al. (2010) identified the complete cDNA and genomic DNA sequence of rbohA. They hypothesized that RbohA is a component of an NADPH oxidase that functions in defense-related active oxygen production on the root surface. A recent study in Brassica campestris showed that expression of RbohA and RbohD was induced by ABA and H2O2 (Zhang et al. 2019).


GWAS of seedling lateral root number for 215 G. arboreum accessions identified 18 significant signals and 19 candidate genes. Three genes, FLA12, WRKY29, and RBOHA, showed higher expression levels in root samples Shixiya1, which may be associated with lateral root development. These findings will be essential to decipher the genetics behind developmental mechanisms of lateral root formation in Asian cotton.

Availability of data and materials

All data generated or analyzed during this study are included in this published article and its additional files.



Genome-wide association studies


Lateral root number


Linkage disequilibrium


Minor allele frequency for the genetic variant of interest


Single nucleotide polymorphism

G. arboreum :

Gossypium arboreum L.


Transcripts per million


Boll weight in Anyang (Henan)


Boll weight in Sanya (Hainan)


Boll weight in Aksu (Xinjiang)


Sympodial branch node in Anyang (Henan)


Sympodial branch node in Aksu (Xinjiang)


Fusarium wilt disease index in Anyang (Henan)


Lint percentage in Anyang (Henan)


Lint percentage in Sanya (Hainan)


Lint percentage in Aksu (Xinjiang)


Seed index in Anyang (Henan)


Verticillium wilt disease index in Anyang (Henan)


Download references


The authors would like to thank the National Medium-term Gene Bank of Cotton in China for their kindly providing cotton seeds.


This work was supported by Central Public-interest Scientific Institution Basal Research Fund, Chinese Academy of Agricultural Science (No.1610162021012) funded by DU Xiongming.

Author information

Authors and Affiliations



Hu DW and Du XM conceived and designed the experiments; H u DW, Jia YH, Wang LR, Wang XY, Zhao ZB, Ge YT, and Pang BY performed the experiments and recorded the data; Hu DW, He SP, Sun GF, and Geng XL performed the GWAS analysis; Hu DW has drafted the work, and Du XM, He SP and Zubair I substantively revised it. All authors have checked and approved the final manuscript.

Corresponding author

Correspondence to DU Xiongming.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Supplementary Information

Additional file1.

Table S1 Phenotypic data of lateral root number in 215 Gossypium arboreum accessions; Table S2 Basic description analysis of lateral root number in 215 Gossypium arboreum accessions; Table S3 18 significant SNPs of lateral root numbers in 215 Gossypium arboreum accessions; Table S4 19 candidate genes of lateral root number identified by the GWAS method; Table S5 TPM expression of candidate genes in different tissues

Additional file2.

Fig. S1 SNP density plot

Additional file3.

Fig. S2 Quantile - quantile (QQ) plots of lateral root number in Asian cotton

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

HU, D., HE, S., SUN, G. et al. A genome-wide association study of lateral root number for Asian cotton (Gossypium arboreum L.). J Cotton Res 5, 19 (2022).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: