Identification of candidate genes controlling fiber quality traits in upland cotton through integration of meta-QTL, significant SNP and transcriptomic data

Meta-analysis of quantitative trait locus (QTL) is a computational technique to identify consensus QTL and refine QTL positions on the consensus map from multiple mapping studies. The combination of meta-QTL intervals, significant SNPs and transcriptome analysis has been widely used to identify candidate genes in various plants. In our study, 884 QTLs associated with cotton fiber quality traits from 12 studies were used for meta-QTL analysis based on reference genome TM-1, as a result, 74 meta-QTLs were identified, including 19 meta-QTLs for fiber length; 18 meta-QTLs for fiber strength; 11 meta-QTLs for fiber uniformity; 11 meta-QTLs for fiber elongation; and 15 meta-QTLs for micronaire. Combined with 8 589 significant single nucleotide polymorphisms associated with fiber quality traits collected from 15 studies, 297 candidate genes were identified in the meta-QTL intervals, 20 of which showed high expression levels specifically in the developing fibers. According to the function annotations, some of the 20 key candidate genes are associated with the fiber development. This study provides not only stable QTLs used for marker-assisted selection, but also candidate genes to uncover the molecular mechanisms for cotton fiber development.


Background
As a natural and renewable resource, cotton fiber has been the most important raw material in the textile and processing industry all over the world. With the improvement of people's living standard and advancements in techniques and diversified methods of spinning, demand for high quality cotton fiber is increasing. Cotton fibers are derived from ovule epidermal cells − 2 to~0 day post anthesis (DPA), and ultimately reach 2.5∼3.5 cm in the mature period (Stewart 1975). The development consists of four stages: fiber initiation, cell elongation, secondary cell wall (SCW) biosynthesis, and maturation (Li et al. 2018a, b). The development mechanism of cotton fiber contributed to the fiber quality improvement. Cotton fiber quality traits are complex quantitative traits, which are influenced by environments and the fiber development, and controlled by many quantitative trait loci (QTLs), including fiber length (FL), fiber strength (FS), fiber uniformity (FU), fiber elongation (FE), and micronaire (MIC), etc. (Ademe et al. 2017;Wang et al. 2016). FL and FS are considered as the most important traits affecting yarn quality, and FS is important for advanced spinning technologies in the textile industry (Yang et al. 2016). The MIC is a measure of fiber fineness and fiber maturity, which influences the fiber processing and dyeing consistency (Rodgers et al. 2017). Hundreds of QTLs contributing to fiber quality traits have been previously mapped for cotton using a variety of populations, and were found evenly distributed throughout the cotton genome (Qin et al. 2008;Shen et al. 2007;Wang et al. 2020;Yu et al. 2013). A total of 104 QTLs for fiber quality traits were detected by using 180 recombinant inbred lines (RIL) derived from Herein and Yumian 1, and 25 QTLs were detected in all three environments (Tan et al. 2018). A total of 134 QTLs for fiber quality traits were detected using 231 F 6:8 RILs, which were derived from an intraspecific cross between Xinlu-zao24 and Lumianyan 28 (Liu et al. 2018b). Seventy-four QTLs were detected to be associated with five fiber quality traits (30 QTLs) and eight yield traits (44 QTLs) using 107 introgression lines, which were developed with an interspecific cross using G. hirsutum acc. 4105 as the recurrent parent and G. tomentosum as the donor parent (Keerio et al. 2018). One hundred and eighty-six additive QTLs were obtained for five fiber quality traits using 137 RILs (Jia et al. 2018).
Marker-assisted selection (MAS) has been successfully applied in genetic improvement of varieties of crops, especially for the major QTLs/genes, such as improvement of rice blast resistance by pyramiding three genes in rice (Xiao et al. 2019), improvement of drought adaptation in maize (Ribaut and Ragot 2007), yield traits in soybean (Reyna and Sneller 2001;Sebastian et al. 2010), Fusarium head blight resistance in wheat (Anderson 2007), and Verticillium wilt resistance in cotton (Zhang et al. 2014). QTL mapping with molecular markers provides a powerful approach to dissect the molecular mechanism underlying complex fiber quality traits (Ijaz et al. 2019). There were thousands of QTLs for cotton fiber quality traits identified in different mapping populations such as RILs, bi-parental segregating populations, and backcross populations (Said et al. 2015b), which provide the potential to be manipulated by MAS for the improvement of cotton fiber quality traits. However, only the stable QTLs for cotton fiber quality traits in various environments and populations can be used in the MAS breeding. In order to make these mapped QTLs more useful to plant breeding and gene cloning, a further analysis of all these loci has to be carried out. In this regard, metaanalysis of QTLs has been proven as an efficient approach to stablish the occurrence of QTL "hotspots" in a consensus map, which correspond to the more precise region where these loci represent under analysis (Goffinet and Gerber 2000;Salvi and Tuberosa 2015). More than three overlapped or location-similar QTLs reported in multiple documents of the same trait is considered as a meta-QTL. This approach was already applied to various crops and complex traits, such as Fusarium head blight resistance in bread wheat (Venske et al. 2019), grain weight in tetraploid wheat (Avni et al. 2018), cyst nematode resistance in soybean (Guo et al. 2006a), resistance to white mold in common bean (Vasconcellos et al. 2017), yield under drought in rice (Swamy et al. 2011), yield in maize (Martinez et al. 2016), and multiple traits in cotton (Said et al. 2013).
Several strategies were combined to identify candidate genes, such as combination of association mapping and linkage analysis (Cui et al. 2018;Mahuku et al. 2016;Zhang et al. 2019a) and combination of QTLs and transcriptome analysis (Chen et al. 2018;Shimono et al. 2016;Wang et al. 2020). A sucrose synthesis-related gene (Gh_D03G1338) associated with FL was identified by the combination of genome-wide association and linkage analyses (Zhang et al. 2019a). Three genes, Gh_ D05G1077 and Gh_D13G1571 for SY, and Gh_ A11G0775 for LY, were identified using genome-wide association mapping. Five candidate genes were identified by the combination of QTL mapping and transcriptome analysis, which regulated pericarp thickness in sweet corn (Wu et al. 2020). A peroxidase gene (GhPRXR1) required for oil content in upland cotton was identified by the combination of genome-wide association and transcriptome analysis (Ma et al. 2019). Two candidate genes for fiber elongation and developmental were identified by the combination of genome-wide association and transcriptome analysis (Ma et al. 2018a).
In this study, 884 QTLs associated with cotton fiber traits from 12 studies (Ali et al. 2018;Diouf et al. 2018;Huang et al. 2017;Jia et al. 2018;Keerio et al. 2018;Li et al. 2016a;Liu et al. 2018b;Ma et al. 2018a;Tan et al. 2018;Wang et al. 2015;Zhang et al. 2015b;Zou et al. 2014) were used for meta-QTL analysis based on upland cotton reference genome TM-1 (Zhang et al. 2015a), and 74 meta-QTLs were identified. Combined with 8 589 significant SNP loci associated to cotton fiber quality traits collected from 15 previous publications (Chandnani et al. 2018;Fang et al. 2017;Gapare et al. 2017;Handi et al. 2017;Huang et al. 2017;Islam et al. 2016;Li et al. 2017bLi et al. , 2018aLiu et al. 2018b;Ma et al. 2018a, b;Su et al. 2016Su et al. , 2018Sun et al. 2017;Wen et al. 2018), 297 candidate genes associated with cotton fiber quality traits were identified. Twenty genes showed high expression levels specifically in the developing fibers, some of which are associated with the fiber development. According to the results, the combination of meta-QTL, significant SNP by genome-wide association analysis (GWAS), and spatiotemporal expression analysis provides not only stable QTLs used for MAS, but also candidate genes to uncover the molecular mechanisms for cotton fiber development.

Meta-QTL analysis
Since SNPs are developed by genome sequencing, each marker has a fixed and unique location in the genome. By anchoring the SNPs on both sides of the QTLs to the TM-1 genome (Zhang et al. 2015a), the confidence interval of the QTLs can be determined. A stable meta-QTL region was obtained by manual organizing, and the stable meta-QTL intervals were illustrated in the form of Circos plot using Circos software (Krzywinski et al. 2009). The SNP loci significantly correlated with the same trait was compared with the meta-QTL intervals; thereby the most likely location of the candidate genes in the meta-QTL intervals was determined.

Candidate gene identification
Eight thousand five hundred and eighty-nine significant SNP loci associated to cotton fiber quality traits were collected from 15 GWAS studies and mapped to TM-1 genome (Chandnani et al. 2018;Fang et al. 2017;Gapare et al. 2017;Handi et al. 2017;Huang et al. 2017;Islam et al. 2016;Li et al. 2017bLi et al. , 2018aLiu et al. 2018b;Ma et al. 2018a, b;Su et al. 2016Su et al. , 2018Sun et al. 2017;Wen et al. 2018) (Table S1, Table S2). Then they were mapped to the 74 meta-QTLs, and the mapped SNPs are shown in Table S3. Lastly, the genes closely linked to the SNPs (SNPs within genes) were selected as candidate genes, which are shown in Table S4.

Gene expression patterns
The tissue expression levels of the candidate genes were obtained from previously reported transcriptome data (Zhang et al. 2015a).

Collection of QTLs and SNPs associated with fiber quality traits
A total of 12 QTL mapping studies for cotton fiber quality traits were used in this study, in which the mapping population size ranged from 107 to 503 lines (Table 1) (Table   1). As a result, a total of 884 initial QTLs related to cotton fiber quality traits were collected, which were unevenly distributed on each chromosome, and ranged from 12 to 57 (Fig. 1 (Table S5).

Meta-analysis of QTL for fiber quality traits
A meta-analysis was performed with 884 QTLs related to cotton fiber quality traits, and a total of 74 stable meta-QTLs related to FL, FS, FE, MIC, and FU were obtained, including 19 for FL, 18 for FS, 11 for FU, 11 for FE, and 15 for MIC, which covered 26 upland cotton chromosomes. There were 33 meta-QTLs in the A subgenome and 41 in the D sub-genome. The confident intervals (CI) of all meta-QTLs were smaller than their respective initial QTLs, which ranged from 2.4 Mb to 13.4 Mb, with an average of 8.5 Mb (Fig. 2, Table S6). Among the 74 meta-QTLs, 19 were obtained from multiple QTLs coincident regions in 4 or more studies (Table  S6), indicating that these regions had high correlation with cotton fiber quality traits.

Meta-QTLs for FE
A total of 11 meta-QTLs related to FE were obtained, covering 11 chromosomes, including 4 meta-QTLs on the A sub-genome (A05, A10, A11, and A13), and 7 meta-QTLs on the D sub-genome (D01, D04, D07, D11, and D12) (Fig. 2, Table S6). Among these meta-QTLs, Fig. 2 Locations of Meta-QTLs and significant SNPs. Each color represents one trait, red for FL, gray for FS, blue for FU, green for FE, and yellow for MIC. There are two tracks for each trait, the inner tracks represent the meta-QTLs, and the outer tracks represent significant SNP loci associated with corresponding traits. FL, fiber length; FS, fiber strength; MIC, micronaire; FU, fiber uniformity; FE, fiber elongation meta-QTL-53 on chromosome D01 was identified in four studies.

Candidate genes identification combined and meta-QTL intervals and significant SNPs
Eight thousand five hundred eighty-nine significant SNP loci associated to cotton fiber quality traits were collected from 15 GWAS studies and mapped to the TM-1 genome ( (Table S1, Table S2, Fig. 2), 4 343 of which were mapped in the 74 meta-QTL regions (Table S3). Two hundred and ninety-seven candidate genes were identified closely linked to the 4 343 SNPs, including 126 genes for FL, 93 for FS, 40 for FU, 20 for FE, 18 for MIC (Table S4).
To further understand the enriched pathways of the candidate genes, KEGG pathway analysis was performed, and 234 annotated genes were assigned to 4 KEGG pathways (P < 0.05), including pentose and glucuronate interconversions, acarbose and validamycin biosynthesis, vitamin digestion and absorption, and membrane trafficking ( Table 2). Some of the pathways have been reported to be associated with fiber development, such as

Key candidate genes identified from expression patterns
To better understand the molecular function of the candidate genes, the expression in 10 tissues (root, stem, leaf, petal, anther, stigma, fibers at four developmental stages) obtained from the transcriptome datasets of the upland cotton genetic standard TM-1 were used for spatiotemporal expression analysis (Zhang et al. 2015a). Among the 126 candidate genes for FL, nine of which (Gh_D08G1950, Gh_D06G0479, Gh_D11G1626, Gh_ D13G1900, Gh_D10G0833, Gh_D13G1965, Gh_ A09G1231, Gh_D08G1970, and Gh_D04G1574) showed high expression levels specifically during the development of cotton fiber. Gh_D08G1950, Gh_D06G0479, Gh_D11G1626, Gh_D13G1900, Gh_D10G0833, and Gh_ D13G1965 showed high expression levels specifically at fiber SCW biosynthesis stages (20 and 25 DPA); Gh_ A09G1231, Gh_D08G1970 and Gh_D04G1574 showed high expression levels specifically at fiber elongation stages (5 and 10 DPA). Among the 93 candidate genes for FS, 5 of which (Gh_A01G1474, Gh_A05G2203, Gh_ D08G2110, Gh_A10G2036, and Gh_A07G1801) showed high expression levels specifically during the development of cotton fiber. Gh_A01G1474, Gh_A05G2203 and Gh_D08G2110 showed specifically high expression levels at fiber SCW biosynthesis stages (20 and 25 DPA); Gh_A10G2036, and Gh_A07G1801 showed high expression specifically at fiber elongation stages (5 and 10 DPA). Among the 40 candidate genes for FU, 2 of which (Gh_A11G2663 and Gh_D11G2059) showed high expression levels specifically during the development of cotton fiber. Gh_A11G2663 showed high expression level specifically at fiber SCW biosynthesis stages (20 and 25 DPA); Gh_D11G2059 showed high expression level specifically at fiber elongation stages (5 and 10 DPA). Among the 20 candidate genes for FE, 3 of which (Gh_A13G0282, Gh_A13G0354 and Gh_ A11G1313) showed high expression levels specifically at fiber elongation stages (5 and 10 DPA). Among the 20 candidate genes for FE, Gh_D11G1416 showed high expression level specifically at fiber SCW biosynthesis stages (20 and 25 DPA) (Table S8).
To better understand the molecular function of the key candidate genes identified from expression patterns, they were annotated in CottonFGD (https://cottonfgd. org/). As a result, 4 genes (Gh_D10G0833, Gh_ A05G2203, Gh_D11G2059, and Gh_A13G0354) have no protein annotation, and others encode a variety of proteins (Table 3). The annotations of the key candidate genes were listed in Table 3.

SNP-based meta-QTL analysis for cotton fiber quality traits supports complete and accurate genetic information
In previous studies, genetic maps were constructed with SSR markers and other low throughput molecular markers in cotton genetic studies (Guo et al. 2013;Nie et al. 2016;Sun et al. 2012). The low molecular markers density in genetic research could not only reduce the number and accuracy of QTL identification, but also result in large confidence intervals of the QTL in the genome (Guo et al. 2006b;Liu et al. 2012;Su et al. 2016).  With the development of sequencing technology, the application of SNP markers are used in cotton genetic research, such as HDGM construction (Ali et al. 2018;Diouf et al. 2018) and GWAS study (Cai et al. 2017;Li et al. 2017a;Sun et al. 2018), which results in the identification of a large number of QTL for cotton fiber quality traits. Though the meta-analysis of QTL for cotton fiber quality traits were already reported, they were based on the low throughput molecular markers (Said et al. 2013(Said et al. , 2015a, which could result in the loss of genetic information, as well as the reduction of accuracy of meta-QTLs. Previous studies have indicated that a region containing multiple QTLs of same traits can be used as a hotspot with a region size of approximately 20 cM (centiMorgan) (Said et al. 2013) or a physical distance of approximately 10 Mb in upland cotton (Keerio et al. 2018). Therefore, in order to ensure the credibility and inclusiveness of the meta-QTLs, the genome region of about 10 Mb was used as the confidence interval of the meta-QTLs. In this study, 884 QTLs for cotton fiber quality traits were collected from high-density genetic maps, which were constructed with large numbers of SNP markers. Seventy-four meta-QTLs were identified. In addition, the application of unique SNP markers makes it easier to map the QTLs from different studies to TM-1 genome of upland cotton.
The meta-QTLs contribute to cotton fiber quality improvement by MAS MAS needs to be enabled through the identification of robust QTLs, the design of reliable marker systems to select for these QTLs, and the delivery of these QTLs into elite genomic backgrounds to enable their use without associated genetic drag (Cobb et al. 2019). In the study, though 884 QTLs for cotton fiber quality traits were collected for meta-analysis, only 74 meta-QTLs were identified, among which, 19 were obtained from multiple QTL coincident regions of 4 or more studies, including 11 for FL, 4 for FS, 1 for FU, 2 for FE, and 2 for MIC. So the flanking SNP markers of the meta-QTLs identified in the study, especially the 19 meta-QTLs, can be used for MAS to improve cotton fiber quality.
The combination of meta-QTL intervals and significant SNP provide reliable information to identify candidate genes Due to the challenging detection of rare variants in GWAS and high false-positive rates in QTL mapping, the combination of association mapping and linkage analysis has been wildly used for revealing the genetic architecture of complex quantitative traits (Andersen et al. 2005;Li et al. 2016b;Visscher 2008). Seventeen candidate genes for kernel test weight were identified in maize by the combination of association mapping and linkage analysis (Zhang et al. 2019c). Nineteen candidate genes for plant and ear height were identified in maize by combining association mapping and linkage analysis (Li et al. 2016b). Twenty-five candidate genes for soybean seed protein and oil content were identified by combining association mapping and linkage analysis (Zhang et al. 2019b). In our study, the combination of meta-QTL intervals and significant SNP identified by association mapping was used for candidate gene identification coving the whole cotton genome, which provides more comprehensive and reliable information. As a result, 297 candidate genes associated with cotton fiber quality traits were identified.

Candidate genes are probably involved in the development of cotton fibers
Due to the difficulty of forward genetics research in cotton, transcriptome analysis combined with QTLs has been widely used to identify candidate genes for fiber development Shi et al. 2006;Tu et al. 2007;Yoo and Wendel 2014). In our study, transcriptome analysis of ten cotton tissues was used for 297 candidate genes expression pattern analysis, as a result, 20 genes showed high expression levels specifically in the developing fibers. In addition, the encoded proteins and functions of the 20 genes were annotated, and many were associated with fiber development (Table 4). Gh_ D11G1626 encoded a COBRA-like protein 4, and expressed in the fiber SCW biosynthesis stages; the function of COBRA-like protein has been reported in sorghum and rice, which were involved in SCW cellulose biosynthesis (Dai et al. 2011;Li et al. 2019;Sato et al. 2010), so Gh_D11G1626 is probably involved in the SWC biosynthesis in cotton. Gh_D13G1900 and Gh_ A01G1474 encoded WAT1-related proteins, and showed specifically high expression level at the fiber SCW biosynthesis stages; the WAT1-related protein was probably related to high fiber yield in cotton (Liu et al. 2018a).
Gh_D13G1965 and Gh_A11G2663 encoded protein WVD2-like 1, and showed specifically high expression level at the fiber SCW biosynthesis stages; in Arabidopsis, WVD2 was involved in the cell expansion (Yuen et al. 2003). Gh_D08G1970 encoded a probable aquaporin PIP1-2, and showed specifically high expression level at fiber elongation stages; aquaporin proteins were reported to be involved in the cotton fiber cell elongation and development Yang and Cui 2009).
Gh_A10G2036 encoded a Rop guanine nucleotide exchange factor 5, and showed specifically high expression level at fiber elongation stages; Rho of plants (ROP) was reported participated in the spatial patterning of SCWs and regulated the polarized cell growth (Kost 2008;Oda and Fukuda 2014;Yanagisawa et al. 2018), so Gh_ A10G2036 is probably involved in the cotton fiber elongation. According to the results, the candidate genes identified by the combination of meta-QTL intervals, significant SNPs, and transcriptome data, are reliable; however, there are still lots of work to do to study the function of the key candidate genes.

Conclusion
In this study, we identified 74 meta-QTLs and 297 candidate genes associated with cotton fiber quality traits, and 20 of which showed high expression levels specifically in the developing fibers and thus are assumed to be associated with the fiber development. The study provides not only stable QTLs used for marker-assisted selection (MAS), but also candidate genes to uncover the molecular mechanisms for cotton fiber development.
Additional file 1: Table S1 Initial QTL location on upland cotton TM-1 reference genome. Table S2. The information of the meta-QTL. Table  S3. SNPs associated with fiber quality traits from 15 papers. Table S4. Significant SNPs location on upland cotton TM-1 reference genome. Table S5. Significant SNPs in the mQTL. Table S6. Candidate genes closely linked SNPs in the meta-QTL. Table S7. GO enrichment of candidate genes. Table S8. The expression of candidate genes in ten tissues.

Gh_D13G1900 Gh_A01G1474
Fiber SCW biosynthesis stages Fiber development Liu et al. 2018a Gh_D13G1965 Gh_A11G2663 Fiber SCW biosynthesis stages Cell expansion Yuen et al. 2003 Gh_D08G1970 Fiber elongation stages Fiber cell elongation Li et al. 2013 Gh_A10G2036 Fiber elongation stages Polarized cell growth Oda and Fukuda 2014