A genome-wide association study of lateral root number for Asian cotton (Gossypium arboreum L.)

The lateral root is one of the most important organs that constitute the root architecture system in plants. It can directly affect the contact area between plants and soil and plays an important role in plant structural support and nutrient absorption. Optimizing root architecture systems can greatly increase crop yields. This study was designed to identify the molecular markers and candidate genes associated with lateral root development in cotton and to evaluate correlations with yield and disease traits. The number of lateral roots for 14-day old seedlings was recorded for 215 Gossypium arboreum accessions. A correlation analysis showed that the number of lateral roots positively correlates with the sympodial branch node and seed index traits, but negatively correlates with lint percentage. A Genome-wide association study (GWAS) identified 18 significant SNPs with 19 candidate genes associated with the lateral root number. Expression analysis identified three genes (FLA12, WRKY29, and RBOHA) associated with lateral root development. GWAS analysis identified key SNPs and candidate genes for lateral root number, and genes of FLA12, WRKY29, and RBOHA may play a pivotal role in lateral root development in Asian cotton.


Introduction
The lateral root, one of the most important organs in root system architecture, plays a pivotal role in water and nutrient acquisition that can influence crop productivity. In rice (Oryza sativa L.), the lateral root defective mutant osiaa11 and the root hairs defective mutant osrhl1 were used to demonstrate that lateral roots and not root hairs played a pivotal role in high Mn and Cd uptake (Yu et al. 2021). In maize (Zea mays), Jia et al. (2018) proved that higher lateral root branch density has greater phosphorus acquisition from low phosphorus soil.
Lateral roots are initiated from the root pericycle cells by asymmetric cell division (Casero et al. 1993). Research progresses on quantitative trait loci (QTLs) of lateral root development have been reported in crops like rice (Oryza sativa L.) (Niones et al. 2015), rap (Brassica napus L.) (Ying et al. 2016), maize (Zea mays L.) (Wang et al. 2019), and soybean (Glycine max (Linn.) Merr.) (Liang et al. 2017), but little knowledge was discovered in cotton. The lateral root development is a complex process that is mainly modulated by endogenous hormones. Auxin transport plays a pivotal role in root branching with different auxin "signal modules" regulating the various stages of lateral root development (Marhavy et al. 2014;Talboys et al. 2014). A recent study found that the AUXIN RESPONSE FACTOR 7 and its inhibitor IAA18/ POTENT formed a negative regulatory loop circuit to drive the root clock in Arabidopsis, which affected the spacing of lateral roots along the primary root (Perianez-Rodriguez et al. 2021). Lv et al. (2021) overexpressed the transcription factor ERF13 in Arabidopsis, resulting in erf13 mutants with greater lateral root emergence, and showed that MPK14-mediated auxin signals controlled lateral root development through ERF13-regulated very-long-chain fatty acid biosynthesis. The auxin signal modules IAA14/SLR (SOLITARY ROOT)-ARF7-ARF19 (Fukaki et al. 2010), IAA12/BDL (BODENLOS)-ARF5 (Ive et al. 2014), PIN proteins (Lewis et al. 2011;Inahashi et al. 2018) have also been reported to be involved in the lateral root initiation and development. Other plant hormones, such as gibberellin (GA) (Farquharson 2010;Hetherington et al. 2021), abscisic acid (ABA) (Duan et al. 2013;Zhao et al. 2014), ethylene (Ivanchenko et al. 2008;Prasad et al. 2010), and jasmonate (JA) (Cai et al. 2015;Kuo et al. 2020) were also associated with the lateral root formation. In cotton (Gossypium hirsutum), the plant growth regulator mepiquat chloride promoted lateral root formation by modulating GA, ABA, and auxin homeostasis (Wu et al. 2019).
Although plant hormones play an important role in lateral root development, the mechanisms of lateral root initiation in cotton have not been elucidated. Genomewide association study (GWAS) is a typical method to evaluate the association between millions of single nucleotide polymorphisms (SNPs) markers and a phenotype of interest in large populations for animals or plants. In cotton, many important agronomic traits like yield (Du et al. 2018), fiber quality (Fang et al. 2017;He et al. 2021) and disease-resistant related traits (Li et al. 2017) have been successfully dissected their underpinnings by GWAS. Gossypium arboreum (Asian cotton) is a diploid cotton species that can be used as a model to study root architecture development. DNA sequencing data is available for 215 G. arboreum accessions (Du et al. 2018). A GWAS analysis was conducted in the present study to detect the significant loci for seedling lateral root number. Results of this study could provide targets for cotton breeders to optimize the cotton root system structure to enhance yield potential.

Determination of lateral root initiation rate
Lateral root development was initially evaluated in the Asian cotton standard line Shixiya1 (Li et al. 2014) to identify the period with the highest frequency of lateral root initiation. Seeds of Shixiya1 were surface sterilized with 15% H 2 O 2 for ~ 30 min, then rinsed five times with steriled water. Seeds were germinated on moisten filter papers and then transplanted into hydroponic containers using a half-strength of Hoagland nutrient solution. The lateral roots were manually counted 4, 8, 12, 20 days after planting for each seedling. Five replicates were set, and each replicate included six plants. The average daily growth of lateral roots was calculated by increased lateral root number/growth days. The differences among growth periods were analyzed by t-test.

Lateral root evaluation for accessions
This evaluation used 215 G. arboreum accessions that had DNA sequencing and phenotypic data for 11 yield and disease-related traits (Du et al. 2018). Seeds for the accessions were provided by the National Mediumterm Gene Bank of Cotton in China (Anyang, Henan). Thirty-six seeds with full and uniform size for each accession were first selected for sowing. Then, all accessions were evaluated using the seed germination bag method according to the manual instructions (Zhao et al. 2021). Each accession included three independent replicates (one replicate per bag), and each bag contained 10∼12 plants (Shixiya1 included as a control). The experiment was conducted in a greenhouse (Temperature: 28℃; photoperiod: light/night = 16 h/8 h; relative humidity: 60% ~ 65%). The number of lateral roots for each plant was counted 14 days after planting as described for the Shix-iya1 evaluation. The mean number of lateral roots and their replicates were calculated and were used for further analysis (Additional file 1: Table S1). A basic description analysis included the range, coefficient of variation, Skewness/Kurtosis was performed by GraphPad Prism 8 (Francis 1995).

GWAS for lateral root development
The genotype data were derived from Du et al. (2018), and DNA sequences of the same 215 accessions were used for the lateral root number in this study. The Burrows-Wheeler Aligner program (BWA, ver. 0.7.10) (Li and Durbin 2009) and GATK toolkit (ver. 3.2-2) (Mckenna et al. 2010) were used to perform the reads mapping and SNP calling with the default parameters, respectively. Software ANNOVAR tool (Kai et al. 2010) was used to annotate the identified SNPs based on the newly updated G. arboreum genome (Yang et al. 2019) annotation information. After filtering, a total of 1 425 003 high-quality SNP markers (MAF (Minor allele frequency for the genetic variant of interest) > 0.05, missing rate per site < 10%) were selected to perform the GWAS for lateral root trait with the EMMAX (Efficient mixedmodel association expedited) model (Kang et al. 2010). The SNP density plot was drawn by R language with "CMplot" package (Turner 2018). The significance for LRN was first set as −log 10 (P) > 6.15 (1/total SNPs used,log 10 (P) = 6.15) (Li et al. 2017), but no effective loci were identified in all the 13 chromosome regions with this threshold. So we manually adjusted their threshold to P < 2.81 × 10 -6 (4/1 425 003 used, and − log 10 (P) = 5.55), and those SNPs with − log 10 (P) > 5.55 were defined as significant SNPs. The candidate genes were identified by the physical locations of the significant SNP and the related stronger LD (linkage disequilibrium) block regions.

LD block and haplotype analysis for lateral root development
The confidence intervals for the lateral root data were identified by IGV_2.3.91 (Helga et al. 2013) and were plotted by R language with the "qqman" package (Turner 2018). The interval regions were further performed using LD block and Neighbor-Joining analysis in Tassel software (Bradbury et al. 2007). The stronger LD block interval regions were further determined using the haplotype analysis method described by Dai et al. (2020). The different haplotypes were roughly estimated by the genotypes of different groups with the color scale method EXCEL 2010 (https:// www. micro soft. com/ en-us/ micro soft-365/ excel).

Transcriptome analysis
RNA-seq data of G. arboreum cultivar Shixiya1 had been sequenced by Huang et al. (2020) (Accession: PRJNA594268 ID: 594268), and we downloaded the TPM (Transcripts per million) expression data of candidate genes from https:// cotto nfgd. org/. The root expression value of the seedling stage was utilized in this study. These candidate genes' tissue and organ expression levels were plotted in R language with a pheatmap package.

Lateral root development for Shixiya1
The number of lateral roots observed varied significantly across the four-time periods (Fig. 1). No lateral roots were recorded 4 days after planting, whereas lateral roots were clearly visible 8 days after planting with a mean of 34.20 ± 7.60 recorded. The number of lateral roots continued to rapidly increase with a mean of 68.00 ± 9.98 recorded 12 days after planting. At 20 days after planting, the rate of lateral root initiation had slowed with a mean of 85.80 ± 10.57 observed. The average daily growth of lateral roots was kept stable on 4∼8 d and 8∼12 d, but it rapidly slowed on 12∼20 d. Using these data, a 14-day time period was selected to evaluate the 215 accessions.

Lateral root data analysis for 215 accessions
To more conveniently describe the lateral root number, we abbreviate this trait as LRN. Furthermore, LRN_AV, LRN_R1, LRN_R2, LRN_R3 represented the mean and three replicates of LRN, respectively. LRN_AV ranged from 18.86 to 74.05 across the 215 accessions with a mean of 43.21 and a coefficient of variation of 26.81% (Additional file 1: Table S2). Replicates of LRN showed a similar range, mean, and variation with LRN_AV ( Fig. 2a; Additional file 1: Table S2). LRN_AV and its three replicates were biased towards the normal distribution due to their frequency distribution and Skewness/Kurtosis values, and this distribution was suitable for GWAS analysis ( Fig. 2b; Additional file 1: Table S2). We are curious about the correlation of the lateral root trait with yield and disease resistance-related traits. So, a correlation was performed. The result showed that LRN_AV positively correlated with the sympodial branch node and seed index traits. Notably, a negative correlation with the lint percentage was observed (Fig. 2c). However, LRN_AV showed a weak correlation with the other yield traits like boll weight and disease-related (VWDI) traits.

GWAS for the number of lateral roots
A total of 1 425 003 high-quality SNP markers were screened out to perform the GWAS in this study, and those SNP densities were shown in Additional file 2: Fig.   S1. The quantile-quantile (QQ) plots of LRN_AV, LRN_ R1, LRN_R2, and LRN_R3 indicated that the EMMAX could be used to identify signals for lateral root number in Asian cotton (Additional file 3: Fig. S2). The Manhattan plots of LRN_AV and its three replicates showed stronger signals on Chr01, Chr07, Chr13, and Scaffold regions (Chr14). The strongest signal was found on Scaffold regions, and a continuous, more robust signal interval on Chr01 (Chr01:78.20∼78.90 Mb) was detected in LRN_AV and all the three replicates (Fig. 3).

Identification of significant SNPs and candidate genes
Eighteen significant SNPs were detected for the lateral root trait with 7 SNPs located on the Chr01, 5 SNPs located on Scaffold regions, 3 SNPs located on Chr07, and 2 SNPs located on Chr13 (Additional file 1: Table S3). The strongest signal was detected on SNP Chr14_23233813 with a − log 10 (P) value of 6.18 and 6.68 for LRN_AV and LRN_R3, respectively. The allelic variant of this site was G/A, and this signal was located on the intergenic region of Ga14G0299 and Ga14G2065. The SNPs for the continuous stronger signal interval on Chr01 included SNP Chr01_78558940, SNP Chr01_78572007, SNP Chr01_78703010, SNP Chr01_78751608, SNP Chr01_78758771, and SNP Chr01_78784026. The LD block of this interval (Chr01:78.20-78.90 Mb) showed a stronger LD block  (Fig. 4a, b). Haplotype analysis of the LD_Chr01 region (Fig. 4c) divided the 215 accessions into three haplotypes (Hap Chr01_1 , Hap Chr01_2 , Hap Chr01_3 ), including 36, 68, and 111 accessions, respectively. The LRN_AV recorded for Hap Chr01_2 accessions was significantly higher than Hap Chr01_1 and Hap Chr01_3 , and the mean number of lateral roots for Hap Chr01_3 was the lowest (Fig. 4d). The LRN_R1, LRN_R2, and LRN_R3 also showed the same result as LRN_AV (Fig. 4d).
Nineteen candidate genes for LRN were detected for the five chromosomal regions (Additional file 1: Table S4). Seven candidate genes were detected on Chr01, six on the scaffold region, two on Chr05, two on Chr07, and two on Chr13. The LD_Chr01 block was associated with the four candidate genes WRKY29, WDR7, PP1, and At5g41590. The strongest signals on the Scaffold region were associated with HSP23.9 and Ga14G2065. Other associated candidate genes and their significant SNPs were listed in the Additional file 1: Table S4.

Discussion
Cotton is a tap root crop with a deep primary root and wide distribution of lateral roots to form a strong absorption network in the soil. Plant root can directly influence the development of above-ground organs, and different root traits may have a positive/negative correlation with yield-related traits in crops. In upland cotton, the root weight and taproot length appeared a positive correlation with lint yield, whereas the tap root showed a negative correlation with lint yield (Li et al. 2003). So far, there are no reports on the relationship between the lateral root number and yield-related traits in cotton. In this study, we found that the lateral root number negatively correlates with lint percentage and displays a positive correlation with sympodial branch node and seed index traits. Verticillium wilt (VW) disease is caused by the soilborne fungus Verticillium dahliae, and the plant roots were usually firstly influenced when the VW disease happened. Whether the lateral root number (LRN) has a correlation with VW disease index (VWDI)? Our correlation results showed a weak correlation between the LRN and VWDI. Our findings will deepen people's understanding of the relationship between lateral root development and yield-related traits in cotton.
Research progresses on lateral root development were reported in several crops (Niones et al. 2015;Ying et al. 2016;Wang et al. 2019), but so far, no QTLs/SNPs of lateral root number was discovered in Asian cotton. While, in this study, we identified 18 significant signals in total and detected a stronger LD block on Chr01 for lateral root number in Asian cotton, which will greatly enrich researchers' understanding of molecular markers that control lateral root development in cotton.
In the present study, seven candidate genes were expressed higher in root organs, and three genes (FLA12, WRKY29, and RBOHA) were closely associated with root development. FLA12 encodes for Fasciclin-like arabinogalactan protein 12 (Johnson et al. 2011). Arabinogalactan proteins FLAs are involved in plant growth and development and play an essential role in many plant species (Seifert et al. 2014;Zang et al. 2015). In Arabidopsis, At-FLA4 is necessary for the average growth of wildtype roots under moderately elevated salinity stress. It positively regulates cell wall biosynthesis and root development via modulating ABA signals (Seifert et al. 2014). Huang et al. (2013) conducted a subcellular localization experiment for the cotton gene GhFLA1, and found the GFP fluorescence signals were displayed on the peripheral of transgenic GhFLA1(SP)-eGFPGhFLA1(GPI) Arabidopsis root cell. Further, they showed that this gene was involved in fiber initiation and elongation in upland cotton. In Barley, Stephan et al. (2016) found that the transcription factor WRKY29 was one of the most important candidate genes for root system variation, and a crucial amino acid substitution was detected within the conserved domain of WRKY29 among genotypes carrying significant and minor QTL alleles. WRKY29 was also closely associated with ABA signaling. A recent study found that OsWRKY29 could act as a new ABA signaling repressor for seed dormancy in rice with RNAi mutant lines displaying greater sensitivity after ABA treatment (Zhou et al. 2020). The WRKY29 gene was negatively correlated with the ABA content, and down-regulated in the tolerant rootstocks under osmotic stress (Hezema et al. 2021). RBOHA is a burst oxidase homolog protein, which participates in diverse biological processes in higher plants (Arthikala and Quinto 2018). Arthikala et al. (2018) found that the relative expression levels of PvR-bohA were correlated with the activity of its promoter. The transgenic hairy roots of PvRbohA transcript silencing showed a significantly altered lateral root phenotype, which indicated that RbohA was involved in lateral root initiation, emergence, and development for P. Vulgaris. In rice, Groom et al. (2010) identified the complete cDNA and genomic DNA sequence of rbohA. They hypothesized that RbohA is a component of an NADPH oxidase that functions in defense-related active oxygen production on the root surface. A recent study in Brassica campestris showed that expression of RbohA and RbohD was induced by ABA and H 2 O 2 (Zhang et al. 2019).

Conclusion
GWAS of seedling lateral root number for 215 G. arboreum accessions identified 18 significant signals and 19 candidate genes. Three genes, FLA12, WRKY29, and RBOHA, showed higher expression levels in root samples Shixiya1, which may be associated with lateral root development. These findings will be essential to decipher the genetics behind developmental mechanisms of lateral root formation in Asian cotton.