QTL mapping of agronomic and economic traits for four F2 populations of upland cotton

Upland cotton (Gossypium hirsutum) accounts for more than 90% of the annual world cotton output because of its high yield potential. However, yield and fiber quality traits often show negative correlations. We constructed four F2 populations of upland cotton, using two normal lines (4133B and SGK9708) with high yield potential but moderate fiber quality and two introgression lines (Suyuan04–3 and J02–247) with superior fiber quality, and used them to investigate the genetic basis underlying complex traits such as yield and fiber quality in upland cotton. We also phenotyped eight agronomic and economic traits and mapped quantitative trait loci (QTLs). Extensive phenotype variations and transgressive segregation were found across the segregation populations. We constructed four genetic maps of 585.97 centiMorgan (cM), 752.45 cM, 752.45 cM, and 1 163.66 cM, one for each of the four F2 populations. Fifty QTLs were identified across the four populations (7 for plant height, 27 for fiber quality and 16 for yield). The same QTLs were identified in different populations, including qBW4 and qBW2, which were linked to a common simple sequence repeat (SSR) marker, NAU1255. A QTL cluster containing eight QTLs for six different traits was characterized on linkage group 9 of the 4133B × Suyuan04–3 population. These findings will provide insights into the genetic basis of simultaneous improvement of yield and fiber quality in upland cotton breeding.


Background
Cotton represents the main source of natural textile fibers in the world and is the most prevalent raw material used in the textile industry . High yield and fine fiber-quality are prerequisites to meet the everincreasing demand. Upland cotton (Gossypium hirsutum) accounts for more than 90% of the global cotton production because of its high yield potential and broad adaptability, but it has moderate fiber quality, whereas G. barbadense produces exceptionally fine-quality fibers, but with lower fiber yield (Cai et al. 2014;Hu et al. 2019).
Most agronomic and economic traits, such as yield and fiber quality, are quantitative traits that are controlled by multiple loci/genes. Moreover, environmental influence is substantial in the control and expression of these traits. Significant negative correlations between fiber quality traits and yield traits have been reported Liu et al. 2018;Zhang et al. 2020). Dissecting the genetic basis of yield and fiber quality is essential for simultaneous improvement of yield and fiber quality.
Molecular genetic methods, especially molecular markers, have been applied widely in cotton in last decade. Recently, the development of the molecular markers was accelaerated with the release of assembled genome sequences of G. hirsutum (Li et al. 2015;Zhang et al. 2015;Wang et al. 2018;Yang et al. 2019) and G. barbadense (Liu et al. 2015;Yuan et al. 2015). Numerous genetic linkage maps, including the intraspecific map of G. hirsutum and the interspecific map between G.hirsutum and G.barbadense, have been constructed using restriction fragment length polymorphisms (RFLPs), simple sequence repeats (SSRs) and single nucleotide polymorphisms (SNPs). Thousands of quantitative trait loci (QTLs) for yield and fiber quality in cotton are documented in CottonQTLdb (Release 2.3, Said et al. 2013;Said et al. 2015). However, to date, there are few studies about the simultaneous dissection of the genetic basis underlying complex traits and their genetic correlations in multiple upland cotton populations by QTL mapping.
In the present study, we used four F 2 populations derived from the hybridization between two G. hirsutum normal lines (4133B and SGK9708) and two introgression lines (Suyuan04-3 and J02-247). Four corresponding genetic linkage maps were constructed using SSR markers. QTL mapping was implemented with the integration of the genotypic and phenotypic data of eight agronomic and economic traits, including yield and fiber quality. Our findings will not only contribute to dissecting the genetic basis underlying yield and fiber quality and their genetic correlations but also provide insights into the simultaneous improvement of yield and fiber quality in upland cotton breeding.

Plant materials and field experiments
Two G. hirsutum normal lines (4133B and SGK9708) with high yield potential but moderate fiber quality, and two introgression lines (Suyuan04-3 and J02-247) with superior fiber quality were used as the parents to generate four F 2 populations in this study. SGK9708 was derived from CCRI41, which is a widely planted cultivar with wide adaptability; 4133B was derived from the hybridization of SGK9708 and the offspring of Gan4104 and CZA (70)33 and has high combining ability; Suyuan04-3 was derived from the distant hybridization of [83-811 × (86-1 × G.armourianum)]; and J02-247 was derived from the cross of Suyin45 × Sukang310, and has large cotton bolls as well as superior fiber length and strength. The cotton materials were provided by the National Mid-term Gene Bank for Cotton of China.

Trait measurements and statistical analysis
In mid-September, all of the plants in the four F 2 populations were investigated for plant height (PH). During the harvesting season, all of the seed cotton was collected and boll weight (BW) and lint percentage (LP) were calculated after the seed cotton had been weighed and ginned. Fiber quality traits, namely fiber length (FL), fiber strength (FS), fiber length uniformity (FU), micronaire (MIC), and fiber elongation (FE), were tested using an HVI 1000 (Uster Technologies, Switzerland) in the Cotton Quality Supervision, Inspection and Testing Center, Ministry of Agriculture, Anyang, China.
The descriptive statistics, namely the maximum, minimum, and mean values, standard deviation, and coefficient of variation (CV), for the eight traits across the four populations, were processed using Microsoft Excel 2013. A correlation matrix was calculated and visualized using the corrplot package in R (Wei and Simko, 2016).

SSR markers analysis
Young leaves were collected from each plant and stored at − 80°C. Genomic DNA of individuals from the F 2 populations and their parents was extracted from young leaf tissues using a modified cetyltrimethylammonium bromide (CTAB) method (Paterson et al. 1993).
Polymorphism detection for the four pairs of parents was performed using 5 713 SSR primers. The primers that amplified stable polymorphic products were selected for genotyping the F 2 populations. The SSR primer sequences were downloaded from CottonGen (https:// www.cottongen.org; Yu et al. 2014). We used a local Basic Local Alignment Search Tool (BLAST) program (Altschul et al. 1990) to map the SSRs to a physical map. The SSR sequences were queried against the G. hirsutum genome sequences ) and the top BLAST hit was selected for further analysis. The separating and silver staining of polymerase chain reaction (PCR) amplified products were performed as detailed by Feng et al. (2015).

Genetic linkage map construction
The genetic linkage map was constructed using Join-Map 4.0 with the regression mapping method and logarithm of odds (LOD) threshold of 5.0. The Kosambi function was used to convert the recombination frequencies to map distances.

QTL mapping and analysis
WinQTL Cartographer 2.5 was applied to identify QTLs with the composite interval mapping (CIM) method. The parameters were set as 1.0 cM for the mapping step, 5 for control markers, and 1 000 for permutation tests. QTLs were considered significant if the corresponding LOD score was > 2.5. The additive effect, dominant effect, and R 2 (percent of phenotypic variance explained by a QTL) were estimated. QTLs detected for the eight traits were named as q-trait-linkage group number (McCouch et al. 1988). A graphic representation of the linkage groups and QTLs were created using Map-Chart 2.2 (Voorrips 2002).
The action mode of a QTL was represented as dominance degree (|D/A|), i.e., an absolute value of dominant effect (D) divided by additive effect (A) (Stuber et al. 1987). The value was considered as additive if the dominance degree was < 0.2, partial dominance for 0.2∼0.8, dominance for 0.81∼1.2, overdominance for > 1.2.
The QTLs identified in this study were compared with those in CottonQTLdb (Said et al. 2015) to determine whether the identified QTLs were novel or had been detected previously. QTLs identified in the present study that shared the same or overlapping confidence intervals with QTLs in the CottonQTLdb based on common marker position were considered as QTLs that had been identified in previous studies.

Phenotypic variation of the four F 2 populations
The phenotypes of eight agronomic and economic traits across four F 2 populations were evaluated. Extensive phenotype variations and transgressive segregation were observed (Table 1 and Fig. 1). Transgressive segregation means that the phenotypic values of some individuals were better than those of the superior parent or worse than those of the inferior parent (Reyes 2019). The CV values revealed differences in variability among the eight traits (Table 1). The CV value for LP was low (5.96-7.98%), whereas both CV values of PH and BW were high and similar (PH: 16.9-21.95%; BW: 15.66-19.7%). Among FL, FS, FU, FE, and MIC, the CV value was lowest for FU (1.59-2.61%) and highest for MIC (13.87-22%). Frequency distribution analysis showed normal distribution for seven of the traits, MIC was the exception ( Fig. 1), suggesting that these traits were quantitative traits controlled by multiple genes and suitable for QTL mapping.

Correlation analysis
Correlation analysis between 32 sets of phenotypic data from the eight traits across the four F 2 populations revealed significant correlations for different traits within and between populations (Fig. 2). BW and LP were significantly negatively correlated (− 0.87 < r < − 0.62) in three populations (4Su, 4 J, Sg4), whereas BW had significant positive correlations with FS, FU, FE, and MIC (0.13 < r < 0.67) in the 4 J and SgJ populations (Fig. 2).
Overall, within populations, most of the correlations were negative between the two yield traits (BW and LP), whereas, most of the correlations were positive among the fiber quality traits, as well as between BW and the fiber quality traits (Fig. 2). Significant correlations were found between multiple traits among the 4Su, 4 J, and Sg4 populations (Fig. 2), suggesting the influence of the common parent 4133B on the traits.

Genetic map construction
Five thousand seven hundred thirteen SSR primers were used to detect polymorphisms in the four pairs of parents. Seven hundred thirty-nine polymorphism SSR primers with clearly amplified bands were retained, including 203 polymorphism primers between 4133B and Suyuan04-3 (Additional file 5: Table S1a), 208 between 4133B and J02-407(Additional file 5: Table S1b), 158 between SGK9708 and J02-407 (Additional file 5: Table  S1c), and 170 between SGK9708 and 4133B (Additional file 5: Table S1d). The polymorphism rates of the primers for the four comparisons were 3.55, 3.64, 2.77, and 2.98%, respectively.
Joinmap 4.0 was employed to construct a genetic linkage map. For the 4Su population, 71 markers were assigned to 10 linkage groups (LGs) with a total map length of 585.97 cM (Table 2, Additional file 1: Fig. S1, Additional file 6: Table S2a). The average length of the LGs was 58.6 cM, and the average distance of markers was 8.25 cM. The longest LG, LG9, contained the most markers (27), and half of the LGs contained only three markers.
For the 4 J population, 61 markers were assigned to 10 linkage groups with a total map length of 752.45 cM (Table 2, Additional file 2: Fig. S2, Additional file 6: Table S2b). The average length of the LGs was 75.2 cM, and the average distance of markers was 12.34 cM.
For the SgJ population, 83 markers, approximately half of the 158 polymorphism markers, were assigned to 15 linkage groups with a total map length of 855.04 cM (Table 2, Additional file 3: Fig. S3, Additional file 6: Table S2c). The average length of the linkage groups was 57 cM. The longest average distance of markers was 21.46 cM on LG13 and the shortest was 1.06 cM on LG14.
For the Sg4 population, 52 markers, approximately one-third of the 170 polymorphism markers were assigned to nine linkage groups with a total map length of 1 163.66 cM (Table 2, Additional file 4: Fig. S4, Additional file 6: Table S2d). The average length of linkage groups was 129.3 cM, and the average distance of markers was 22.38 cM.  Fig. 3).
LG9 in the 4Su population harbored the highest number of QTLs (13), following by LG6 (6) and LG1 (5) in the Sg4 population. Seven QTLs for PH were identified, but six of them in the 4Su population had only minor effects (0.11% < R 2 < 4.02%; Table 3, Fig. 3). The additive effects of QTLs qPH2-1 and qPH2-2, which with the higher R 2 (2.66% and 4.02%), were positive, indicating that the favorable alleles were from Suyuan04-3. The action modes of qPH2-1 and qPH2-2 were over-dominance according to the dominance degree values.
Eight QTLs for BW were identified with R 2 of 1.17%∼9.31% in the 4 J (1), SgJ (1), and Sg4 (6) populations (Table 3, Fig. 3). It is noteworthy the LGs that harbored one of QTLs in the 4 J (qBW4) and SgJ (qBW2) populations were anchored to chromosome A05, and the common SSR marker, NAU1255, was detected close to the QTL interval implying that NAU1255 was closely linked to BW. Furthermore, the directions of the additive and dominance effects of these QTLs were the same.
Interestingly, both LG7 in the 4Su population and LG6 in the SgJ population were anchored to chromosome A13 ( Table 2). The common SSR markers, BNL2449 and NAU1211, were detected near QTLs qFL7 4Su and qFL6, hinting that BNL2449 and NAU1211 may be closely linked to FL. In addition, the additive effect of QTL qFL2-2 was positive, suggesting that the favorable alleles come from the male parents, Suyuan04-3 and J02-247, that is endowed with superior fiber quality.
Five QTLs for FS were identified: four with R 2 of 2.95%∼7.15% in the 4Su population and one major QTL with R 2 of 15.10% in the Sg4 population (Table 3, Fig. 3). The additive effects of these four QTLs in the 4Su population were positive, whereas the additive effect of the one major QTL in the Sg4 population was negative, implying that the parent, 4133B may not have conferred the favorable allele.
Only two QTLs for FU were identified with R 2 of 0.10%∼1.21% in the same LG of the 4Su population (Table 3, Fig. 3).
Four QTLs for FE were identified with R 2 of 0.16%∼5.62% in the 4Su, SgJ and Sg4 populations (Table  3, Fig. 3). The additive effect of one QTL, qFE8, was    Fig. 3). A major QTL, qMIC2 with R 2 of up to 59.24%, was in LG2 of the Sg4 population, the other four QTLs were minor with R 2 0.15%∼6.29%. The dominance degree values of all QTLs, except qMIC9-2, were up to 9.41∼92.03, suggesting the action modes were over-dominance.
A hotspot region was detected in LG9 of the 4Su population (Fig. 3a). Three QTLs (qFL9-1, qFS9-1, qFE9) were identified only at the position of 96. and MIC (100.81 cM). Therefore, this hotspot region may be an important genome region that affects agronomic and economic traits in cotton. Two other QTLs, qFU9-1 and qMIC9-1, were identified in the same LG9 at 41.71 cM.

QTLs comparison and analysis
We compared all of the identified QTLs with the QTLs in CottonQTLdb database. The results showed that onefifth of our QTLs (10/50) overlapped with previously reported QTLs, illustrating the reliability of our QTL mapping and indicating the other 40 QTLs were novel QTLs. The 10 common QTLs were reported to be associated with FL (4), FS (2), PH (1), BW (1), LP (1), and FE (1) traits. QTLs for FL were the most identified QTLs in both the present study (11) and CottonQTLdb database (494), which may have increased the probability of a hit.
QTLs for different traits that shared the same or overlapping confidence intervals were considered to be in QTL clusters. In the present study, a total of nine QTL clusters were identified in the 4Su (5), 4 J (1), and Sg4 (3) populations. The QTL cluster harboring the most QTLs was the hotspot region described above, with eight QTLs for six traits. Another QTL cluster in the same LG (LG9 in the 4Su population) contained QTLs for FU and MIC (Fig. 3a).
As we know, BW and LP represented yield traits, FL, FS, FU, FE and MIC represented fiber quality traits. With this prerequisite, the analysis of paired trait QTLs was employed. There were 19 paired trait QTLs within six paired traits (BW and FL, or FE; LP and FL, FS, FU, or FE) that had significant medium or high positive correlations (|r| > 0.3) in the F 2 populations. Six of the 19 paired trait QTLs had the same direction of addictive effects (Additional file 7: Table S3).

Discussion
To dissect the genetic basis underlying yield and fiber quality as well as their genetic correlations, two upland cotton normal lines (4133B and SGK9708) and two introgression lines (Suyuan04-3 and J02-247) were selected as parents, and four F 2 populations were constructed. Among these populations, the female parents of 4Su, 4 J and SgJ were potential high yield lines, and the male parents were superior fiber quality lines. Thus, extensive phenotypic variation was observed in the cross combinations whose parents had a distant kinship. All of the targeted traits exhibited normal distribution patterns across the four F 2 populations (Table 1, Fig. 1), suggesting that these traits were quantitative traits controlled by multiple genes.
Furthermore, all of the traits exhibited transgressive segregation and many individuals with transgressive phenotypes were found (Table 1, Fig. 1). For example, all of the median values of FL and FS in the 4Su, 4 J, and SgJ populations were higher than or nearly 30, fiber with two quality values over 30 (FL ≥ 30 mm and FS ≥ 30 cN·tex − 1 ) is generally considered as fine quality one. In plant breeding, transgressive segregation provides an adaptive advantage for traits (Reyes 2019). To a certain extent, high yield and fine-quality fibers are the outcome of the adaptation of cotton. Therefore, it is not surprising that many instances of transgressive segregation were observed for these traits in the F 2 populations. Furthermore, some of these transgressive lines can be used to breed for high-quality fiber. However, these characteristics imply that the favorable alleles of the fiber quality traits were generally from the introgression lines parents.