Genetic map construction and functional characterization of genes within the segregation distortion regions (SDRs) in the F2:3 populations derived from wild cotton species of the D genome

Segregation distortion (SD) is a common phenomenon among stable or segregating populations, and the principle behind it still puzzles many researchers. The F2:3 progenies developed from the wild cotton species of the D genomes were used to investigate the possible plant transcription factors within the segregation distortion regions (SDRs). A consensus map was developed between two maps from the four D genomes, map A derived from F2:3 progenies of Gossypium klotzschianum and G. davidsonii while Map B from G. thurberi and G. trilobum F2:3 generations. In each map, 188 individual plants were used. The consensus linkage map had 1 492 markers across the 13 linkage groups with a map size of 1 467.445 cM and an average marker distance of 1.037 0 cM. Chromosome D502 had the highest percentage of SD with 58.6%, followed by Chromosome D507 with 47.9%. Six thousand and thirty-eight genes were mined within the SDRs on chromosome D502 and D507 of the consensus map. Within chromosome D502 and D507, 2 308 and 3 730 genes were mined, respectively, and were found to belong to 1 117 gourp out of which 622 groups were common across the two chromosomes. Moreover, genes within the top 9 groups related to plant resistance genes (R genes), whereas 188 genes encoding protein kinase domain (PF00069) comprised the largest group. Further analysis of the dominant gene group revealed that 287 miRNAs were found to target various genes, such as the gra-miR398, gra-miR5207, miR164a, miR164b, miR164c among others, which have been found to target top-ranked stress-responsive transcription factors such as NAC genes. Moreover, some of the stress- responsive cis-regulatory elements were also detected. Furthermore, RNA profiling of the genes from the dominant family showed that higher numbers of genes were highly upregulated under salt and osmotic stress conditions, and also they were highly expressed at different stages of fiber development. The results indicated the critical role of the SDRs in the evolution of the key regulatory genes in plants.


Background
Segregation distortion (SD) is described as a deviation from the expected Mendelian ratio within a segregating population due to various segregating distorters (Anhalt et al. 2008). Some of the factors that may lead to SDs include gametic and zygotic selections, non-homologous chromosome recombination, gene transfer, environmental agents, mapping population, marker types and genetic transmission (Mello et al. 1991). During the construction of genetic maps, it has been observed that some alleles in chromosomal regions skew from the normal Mendelian ratio. These alleles tend to cluster at segments of the chromosome, and these regions are referred to as the segregation distortion region (SDR) (Lu et al. 2002).
Research has shown that SD could bring errors in the marker order and map distances in the linkage map and thus reduce the accuracy of the maps (Yuan et al. 2019). However genes of significance have been mined within the SDRs, for instance, the gene for crown rot resistance in wheat was identified within the SDR (Bovill et al. 2006), while the gene responsible for stem rust tolerance, was detected in the SDR on chromosome 2B in wheat (Tsilo et al. 2008). Moreover, SD has been observed in a variety of populations of organisms including insects (Sandler and Golic 1985), plants (Yuan et al. 2019), and mammals (Kumari et al. 1992).
Higher frequencies of occurrence of the SDR have been found in populations developed through interspecific as compared with intraspecific crosses (Dai et al. 2017), for example in rice more SDRs were detected in the double haploid compared to the F 2:3 populations developed from the same intraspecific cross (Xu et al. 1997;Wu et al. 2010), thirty-six SDRs were detected on 20 chromosomes in recombinant inbred lines in tetraploid cotton (Jamshed et al. 2016). Further evidence points out that the genes associated with zygotic and gametic selection could be responsible for SD (Manrique-Carpintero et al. 2016).
The use of molecular markers is preferred in the genotyping of populations because they are less influenced by phenotype and are significant in the study of SD . The most used molecular marker in the analysis of SD is the simple sequence repeat (SSR); it has been widely used in the study of SD in the majority of plants and animals (Cheng et al. 2016;Wang et al. 2019). Several studies on SDs have been conducted in several plant species, including rice (Reflinur et al. 2014;Yang et al. 2014), maize (Lu et al. 2002;Wang et al. 2012), wheat (Kumar et al. 2007), barley (Liu et al. 2011), soybean (Liu et al. 2000), rapeseed (Yang et al. 2006), cotton (Wu et al. 2003;Amudha et al. 2012), and other plants. In the analysis of SD in the F 2:3 population of Aegilops tauschii, it was observed that some regions had skewed ratios towards particular alleles in the chromosomes (Fans et al. 1998).
The studies conducted in cotton showed that the majority of the SDs were mainly skewed towards the male parent rather than the female population, as was observed on chromosome 18 (Dai et al. 2017). However, in all the studies conducted to unravel the mystery of SDs in cotton, no experiment has been undertaken to explore the SDs in the F 2:3 population derived from the diploid wild cotton parental lines. The latest attempt to explore the SDs in the wild cotton progenitors involved a backcross population developed between G. hirsutum as the recurrent parent and G. mustelinum as the donor cultivar (Chandnani et al. 2017). And therefore, to explore the phenomena of the SDs in wild cotton progenitors, an interspecific population between G. klotzschianum and G. davidsonii, and between G. thurberi and G. trilobum were developed. The four parental lines were primarily selected because of their diverse genetic traits and broader ecological niches. The four parental lines used in the construction of the genetic maps are known to have traits for resistance to bacterial blight (G. davidsonii) (Zhang et al. 2016), sucking pests such as aphids (G. klotzschianum) (Wei et al. 2017), Fusarium wilt, silver leaf whitefly and cotton bollworm resistance (G. thurberi) (Natwick 2006), Verticillium wilt (G. trilobum) (Dong et al. 2019). A total of 188 individuals were genotyped using SSR markers, primarily focusing on the exploitation of the genetic mechanism of the SD in severely distorted chromosome D 5 02 and chromosome D 5 07. The analysis of the SD from the genetic maps constructed from the diploid cotton of the D genome was conducted. The first map was then generated from two closely related parents, G. klotzschianum and G. davidsonii ) and the second map developed from G. thurberi and G. trilobum , in either of the maps, the F 2:3 population used, the genotypic data from the two maps were combined to generate the consensus map, and the consensus map was generated by using the two maps. The only available maps developed from the wild cotton species of the D genome. The focus was on chromosome D 5 02 and chromosome D 5 07 which showed severe distortions of markers from the two maps. Moreover, the marker segregation and genes within the SDRs were mined and analyzed. The genes mined within the SDR and understanding their roles will be significant in elucidating the role played by segregation distortion, and will help in improving the elite cultivated cotton germplasms with ever-shrinking genetic base and significantly lower adaptive mechanisms to various abiotic and biotic stress factors.

Parental materials
The two genetic maps were generated from an interspecific population obtained from the four parental lines. The first genetic map (Map A) was constructed from the F 2:3 population derived from the self-pollinating F 1 population of G. klotzschianum (female parent) and G. davidsonii (male parent). Similarly, the second genetic map (Map B) was constructed from F 2:3 populations derived from G. thurberi (female parent) and G. trilobum (male parent). A total of 188 progenies were used as the mapping population. The F 2:3 progenies from the four parental lines were developed and grown in the wild cotton nurseries, managed by the Institute of Cotton Research, Chinese Academy of Agricultural Sciences (ICR, CAAS), located in Sanya, Hainan province, China. The development of the F 2:3 progenies followed a similar pattern as described by Magwanga et al. (2020) in the development of the backcross progenies between G. tomentosum (donor male parental line) and G. hirsutum (recurrent female parental line).

Molecular markers genotyping
Total DNA was extracted from the F 2:3 progenies and their parental lines using the CTAB method (Zhang et al. 2000b). Polymerase chain reaction (PCR) was conducted. The amplified PCR products were electrophoresed on non-denaturing 10% polyacrylamide gel electrophoresis in the 1 × TBE buffer, and the gels were then visualized after silver staining (Huang et al. 2018). The primers used were the SWU markers which were developed by Southwest University in China, hence the acronym SWU. In the construction of the genetic map A, a total of 12 560 SWU markers were screened of which 1 000 markers were found to be polymorphic. Out of the 1 000 polymorphic markers, 728 markers were mapped and generated 13 linkage groups, designated as chromosome D 5 01 to D 5 13. In the second genetic map, map B, 12 560 SWU markers were screened, of which 996 markers were polymorphic, and only 849 polymorphic markers were mapped onto the 13 linkage groups. For the construction of consensus map, 1 492 polymorphic markers were applied to generate the genetic map, after removing the duplicated markers. The details of the markers and their sequences are shown in Supplementary Table S1.

Linkage map construction and determination of the segregation distortion of molecular markers
Markers with less than 5% missing data were used in the mapping of the linkage groups in the three maps (Coulton et al. 2020). The Joinmap 4.0 mapping tool was applied with a recombination frequency of 0.40, and a LOD (logarithm of odds) score of 3.0, for any LOD above 2.5 is known to be above the noise level (Faleiro et al. 2003). The Kosambi mapping function was used to convert the recombination frequencies to map distances. The linkage groups were then constructed using Mapchart 2.3 software (Voorrips 2002). The consensus map was constructed by merging the two individual data sets. Maps were drawn using MapChart 2.2 (Voorrips 2002).
Annotation of genes at the segregation distortion regions (SDRs) and the analysis of phylogenetic tree Sequences corresponding to the SSR markers were identified by BLASTN (Nucleotide basic local alignment search tool) to the cotton ESTs (Expressed sequence tag) with an E ≤ 1 e-15 and were annotated using BLASTX (Translated nucleotide sequence searched against protein sequences) (NCBI, Bethesda, MD, USA). The four genotypes, G. klotzschianum, G. davidsonii, G. thurberi and G. trilobum, have not been sequenced, the D 5 , G. raimondii was used as the reference genome. A similar method has been used to explore the genetic variation among the BC 2 F 2 genotypes developed from G. hirsutum as the recurrent parent and G. tomentosum as the donor parent (Magwanga et al. 2018b). The mined genes within this SDR that belonged to the two most abundant subfamilies, the probable protein types and the serine/threonine-protein kinase were then analyzed for their properties and function. A phylogenetic tree was constructed and, the multiple sequence alignments of all the proteins were done by Clustal Omega, MEGA 7.0 software (Kumar et al. 2016). The neighboring method (NJ) was used with a bootstrap value of 1 000 replications, and other parameters were applied as per the default setup, as previously used in the analysis of the phylogenetic relationships of the LEA proteins in cotton (Magwanga et al. 2018b). Transcriptional response elements of genes for the two major subfamilies were predicted using an online tool, the PLACE database (http://www.dna.affrc. go.jp/PLACE/signalscan.html) (Higo et al. 1999). The genes targeted by miRNAs were predicted by searching 5′ and 3′ untranslated regions (UTRs) and the coding sequences (CDS) of all the genes for their complementary sequences for the cotton miRNAs using the psRNATarget server (http://plantgrn.noble.org/psRNATarget/function).

Gene Ontology (GO) annotation
Analysis of GO annotation was conducted using Blas-t2GO PRO software version 4.1.1 (https://www.blast2go. com). The GO annotations described the hierarchal roles of the genes and their products, it entailed three independent ontological terms, the molecular function (MF), biological process (BP), and cellular component (CC) (Langfelder and Horvath 2008;Magwanga et al. 2018c). The protein sequences of the dominant gene domains were obtained within the SDRs and subsequently analyzed through Blast2GO as previously applied in the analysis of the LEA genes in cotton (Magwanga et al. 2018b).

RNA and RT-qPCR validation of key genes harbored within the SDRs
Based on the previous work by our research team, G. raimondii (D5), G. thurberi, and G. trilobum were profiled under biotic stress conditions, in which the plants were exposed to Verticillium dahliae infection (Dong et al. 2019). The genes which were harbored within the SDR were also prominently expressed, and majorities were members of the Probable Protein Types and the Serine/Threonine-Protein Kinase. Moreover, the de novo sequencing of the G. klotzschianum and G. davidsonii revealed a similar pattern (the data yet to be published). The highly upregulated genes were further validated under abiotic stress conditions, in which the seedlings of G. klotzschianum, G. davidsonii, G. thurberi, and G. trilobum at the third-leaf stage were exposed to drought and salt stress by exposing the seedings to 15% Polyethylene glycol 6000 (PEG6000) and 250 mmol•L -1 NaCl, respectively. The leaf tissues were then harvested for RNA extraction at 0 h, 1 h, 3 h, 6 h and 12 h of post-stress exposure. RNA extraction, purification, and RT-qPCR analysis were carried out as described by Lu et al. (2018). Cotton GrActin was applied as the reference gene.

Linkage map construction
The first map was developed from the F 2:3 population between G. klotzschianum and G. davidsonii, a total of 728 polymorphic markers were used. The total map length was 1 480.23 cM, with an average marker interval of 2.182 cM ). This map was designated as map A. The second map, designated as map B, was derived by genotyping the F 2:3 population developed between G. thurberi and G. trilobum, and 849 polymorphic markers were used in the linkage map construction. The map size was 1 012.46 cM with an average marker distance of 1.193 cM. In both maps, it was observed that chromosome number two also annotated as D 5 02 had the least map size of 82.908 cM and 28.665 cM in map A and map B, respectively. Interestingly, in both maps, chromosome D 5 02 had a smaller map size but with the highest percentage of SD (Table 1). Similar results have been observed in other linkage maps in cotton (Yu et al. 2011;Li et al. 2016).
The consensus map was constructed by merging two data sets from the two genetic maps. A total of 1 492 markers were mapped onto the 13 linkage groups encompassing the 13 chromosomes, and only 85 markers remained unlinked. The diploid cotton species has 13 chromosomes, while the tetraploid cotton species has 52 chromosomes (Mendoza et al. 2013;Magwanga et al. 2018a). This work was based on the diploid cotton species of the D genome. The consensus map size was 1 467.445 cM with an average marker distance of 1.037 Table 1 Mapping statistics for the two individual maps and the consensus genetic maps of diploid cotton in the D Genome. Map A represents genotyping of G. klotzschianum and G. davidsonii, Map B represents genotyping of G. thurberi and G trilobum while Consens Map represents the combination of genotypic data of map A and B. cM. Even though the map size was relatively smaller than map A, the marker interval was low, which improved the precision of the consensus map. From the consensus map, we observed that Chromosome D 5 02 had the highest percentage of SD with 58.6%, followed by Chromosome D 5 07 with 47.9%. Chromosome D 5 01 had the highest number of markers (143), while Chromosome D 5 02 had the least number of markers (58) ( Table  1). Most of the markers mapped on the consensus map were found to be contributed by map B rather than map A. A total of 797 markers from map B were mapped on the consensus map accounting for 53.4% while only 695 markers (46.6%) were from map A. The chromosome with the highest number of markers (143)was Chromosome D 5 01 while the chromosome with the least number of markers (58) was Chromosome D 5 02 (Fig. 1).

Segregation distortion (SD) analysis
In map A, out of the 728 markers mapped, 159 markers were distorted accounting for 22.2%, and the highest SD was observed in Chromosome D 5 02 with 76.1% followed by Chromosome D 5 07 with 40.7%. The SDRs were located on Chromosome D 5 02, D 5 05, D 5 07, and D 5 08. Chromosome D 5 02 had the largest SDR, while Chromosome D 5 07 had the highest number of SDR.
It was observed that the alleles in SDR were skewed towards a particular parental line, like that in Chromosome D 5 02 towards the female parent (G. klotzschianum), and in Chromosome D 5 07 towards the heterozygosity ). In the second genetic map B, there was a slightly lower number of distorted markers, with only 135 accounting for 15.8%, and the highest two segregation distortions were observed in Chromosome D 5 02 and Chromosome D 5 07 with 42.8% and 38.3%, respectively ( Table 1). Chromosomes that had the SDRs were D 5 01, D 5 02, D 5 06, D 5 07, D 5 09, D 5 10, and D 5 11. Moreover, the largest SDR was located on Chromosome D 5 02, while Chromosome D 5 07 had the highest number of SDR.
In the consensus map, the highest SDs were located on Chromosome D 5 02 and D 5 07, with distortion percentages of 58.6% and 47.9%, respectively. Similarly, the two chromosomes had the largest SDRs (Fig. 2). The largest SDR was located on Chromosome D 5 02-2 and was skewed toward the female parents while SDR located on Chromosome D 5 02-1 was skewed towards the heterozygous. Chromosome D 5 07 had the highest number of SDRs with a total of five SDRs, and all the SDR were skewed towards the heterozygotes except for the SDR located on Chromosome D 5 07-1, which was skewed towards the female parents. The majority of the SDRs were skewed towards the heterozygotes. Similar results were observed in the analysis of SDRs in tetraploid cotton, more specifically on Chromosome 18 (Dai et al. 2017), and rice (Wu et al. 2010), wheat (Fans et al. 1998). Based on the individual maps, the SDs were skewed towards the female compared with the male parent, the results obtained were in agreement with the study conducted on an interspecific F 2 population in which the segregated distorted markers were skewed towards the female parent (Li et al. 2007).

Annotation of genes at SDR
We conducted a BLAST search, and a total of 6 038 genes were mined within the SDR in Chromosome D 5 02 and D 5 07 (Supplementary Table S2), with 2 308 genes in Chromosome D 5 02 and 3 730 genes in Chromosome D 5 07. These genes were further divided into 1 117 groups were obtained. There are 622 groups which were shared between Chromosome D 5 02 and Chromosome D 5 07. The largest group consists of a total of 188 genes, among which the Pkinase (PF00069) was encoded; followed by a group with 132 genes, among which PF13855 (LRR_8, Leucine-rich repeat) is encoded. The third consists of 108 genes, among which PF07714 (Pkinase_Tyr, Protein tyrosine kinase) is encoded. The genes in the three main groups were highly correlated with abiotic stress response. The genes located within the largest 12 grous were analyzed. Out of the 12 groups, 9 were found containing 'members of the resistant genes (R group of genes), including that encoded Protein kinase, LRR_8, Protein tyrosine kinase , NB-ARC, LRRNT_2, Leucine-rich repeat Nterminal, PPR, PPR_2 family, Cytochromes P450 (CYPs), Myb-like DNA-binding, and RNA recognition motif (RRM, RBD, or RNP) (Tables 2 and 3). The abundant/enriched domain was the Protein kinase domain (PF00069). It has been widely studied; for instance, it was found to be the dominant domain in the analysis of the genes conserved between the two upland cotton, G. hirsutum and its wild relative G. tomentosum (Magwanga et al. 2018b). We, therefore, explored the genes which belonged to this domain. The physiochemical properties of these genes showed significant variations; the molecular weight ranged between 10.351 kDa and 134.232 kDa, the charge was between − 27 and 39.5; Isoelectric point (pI) values were between 4.375 and 10.382; the GRAVY values ranged from − 0.721 to −0.251 while their protein lengths ranged from 611 aa to 12 310 aa (Supplementary Table S3). The GRAVY values were all below zero, indicating that these genes were mainly hydrophilic. The group that encoded Protein kinase domain contained 28 different subfamilies. The subfamily with the highest number of genes was Probable types with a total of 64 genes, which included members such as the Probable inactive receptor kinase (4 genes); Probable leucine-rich repeat receptor-like serine (21), Probable L-type lectin-domain containing receptor kinase (3 genes); Probable receptor-like protein kinase (25 genes) among others (Supplementary Table S4).
The two most abundant subfamilies, the probable protein types and the Serine/threonine-protein kinase were further analyzed, by looking into their classification based on the phylogenetic tree analysis. The genes were found to be grouped into five clades, with clade 2 being the majority (Fig. 3). The most interesting concept is that the members within clade 3 had a percentage bootstrap similarity value of 100%. The majority of these genes have previously been found to be highly correlated to biotic stress tolerance; for instance, 11 genes, i.e., Gorai.007G335000, Gorai.002G039900, Gorai.002G040100, Gorai.002G041100, Gorai.002G041200, Gorai.002G041800, Gorai.002G042100, Gorai.002G047500, Gorai.002G047900, Gorai.007G182500 and Gorai.007G334900, are homologous to an Arabidopsis gene, At5g39020, which has a functional role in leaf senescence during viral infection in Arabidopsis (Espinoza et al. 2007). Moreover, the remaining genes were homologous to an Arabidopsis gene, At1g67000, which plays a more significant role in salt stress pathways. It was also was found to be highly upregulated in the roots under salt stress conditions ).
Cis-regulatory elements analyses of the major two subfamilies: the probable protein types and the serine/ threonine-protein kinase We examined the two major subfamilies to determine if there could be any of the regulatory elements related to either abiotic or biotic stress factors. Cis-regulatory elements are known to enhance the functions of the genes (Tümpel et al. 2006). In the analysis of the cis-elements, all the genes were found to be associated with either abiotic or biotic stress-responsive cis-regulatory elements; for instance, ARFAT with the sequence "TGTCTC" was found to be associated with 87 genes which function as ABA and auxin responsiveness. ABA is a plant phytohormone that is vital for plants' response towards stress (Trivedi et al. 2016). Other cis-regulatory elements predicted were CBFHV with a role in dehydrationresponsive element / cold acclimation, DRECRTCOR-EAT functioning as activators that function in drought-, high-salt-and cold-responsive gene, lastly ABRELA-TERD1 with a function in early responsive to dehydration, AGMOTIFNTMYB2 induced by various stress such as wounding or elicitor treatment among others ( Fig. 4 and Supplementary Table S5). The cis-regulatory elements detected such as ABRE have previously been found to associate with top-ranked plant stress-responsive transcription factors such as the NAC, MYB (Nakashima et al. 2009).
miRNA prediction for the major two subfamilies; the probable protein types and the serine/threonine-protein kinase In the prediction analysis of the miRNA targeting the various genes obtained for the two major subfamilies, a total of 287 miRNAs were found to target 91 genes (Supplementary Table S5). The high number of miRNA targets detected for these genes showed that the genes obtained from the SDR on chromosome D 5 02 and chromosome D 5 07 have a significant role in various biological processes. The highest level of miRNA target was observed for the following genes: Gorai.002G039900 (6 miRNAs), Gorai.002G041100 (9 miRNAs), Gorai.002G114100 (9 miRNAs), Gorai.002G133000 (7 miRNAs), Gorai.002G134400 (8 miRNAs), Gorai.007G244000 (9 miR-NAs), Gorai.007G271300 (10 miRNAs) among the rest. The miRNAs and the genes association revealed that higher level of miRNA targets , for instance, a single gene was targeted by 2∼10 miRNAs. Some of the miRNAs detected were gra-miR172a and gra-miR172b all found to target Gorai.007G059900 which is a member of the serine/threonine-protein kinase. The SAPK2 mined within the SDR located on chromosome D 5 07 has been found to have a function in fiber development in cotton (Abdurakhmonov et al. 2008). Moreover, miR398 has been extensively studied and found to have a role in enhancing abiotic stress tolerance in plants; for instance, gra-miR398 was found to be upregulated in plants exposed to water deficit conditions, and thus found to be responsible for enhancing tolerance towards oxidative stress, water deficit, salt stress, abscisic acid stress, ultraviolet stress, copper and phosphate deficiency, high sucrose and bacterial infection (Jia et al. 2009;Lu et al. 2010;Pashkovskii et al. 2010). The same miRNA was found to target Gorai.007G335000, a member of the probable receptor-like protein kinase mined within the SDR on chromosome D 5 07.
GO annotation of the major two subfamilies; the probable protein types and the serine/threonine-protein kinase of the dominant gene groups In the analysis of the GO terms, a total of 188 genes were found to have GO terms, in which a high number of genes were found to be involved in biological process (BP), with functions such as regulation of the biological process, response to stimulus, single-organism process, metabolic process, and cellular process, in relation to cellular component (CC), four major functions were detected, namely the cell, cell part, membrane part, and membrane while in molecular functions (MF), and only two functions were observed, binding and catalytic activity (Fig. 5). Some unique results were found in some of the genes within the SDRs; for instance, Gorai.002G14960 (BRASSINOSTEROID INSENSITIVE 1-  Table S6).

RNA sequence data analysis profiled under abiotic stress conditions and in different Fiber developmental stages
By the fact that the two major subfamilies were found to be targeted by stress-specific miRNAs and even found to be associated with some known cis-regulatory elements, we undertook to investigate if the genes would have any varying expression under drought, salt and even different stages of fiber development. Genes and their RNA sequences were then obtained from the de novo sequenced data. The raw data for the RNA sequencing were transformed into log 2 form and used in the construction of the heat map. The RNA expression analysis showed that the genes were categorized into three groups, with members in group 1 exhibiting higher expression levels at different fiber development stages (Fig. 7). The majority of the highly upregulated genes were obtained from the SDRs in chromosome D 5 07, such as Gorai.007G283900 (Serine/threonine-protein kinase Nek2), Gorai.007G186000 (Probable inactive receptor kinase At1g48480), Gorai.007G053000 (Serine/threonine-protein kinase SRK2I), Gorai.007G285300 (Serine/threonine-protein kinase WNK1), Gorai.007G235600 (Genome polyprotein), Gorai.007G247600 (Serine/threonine-protein kinase ppk15) and Gorai.007G308900. It is interesting to note that the gene which was highly upregulated in various stages of fiber development was also found to be targeted by gra-miR164a, and the same miRNA has been found to target the NAC transcription factor family (Xie et al. 2000). Moreover, mutant Arabidopsis lacking ath-miR164c was found to exhibit a slight defect in carpel fusion (Baker et al. 2005). In addition, miR164a,b,c has been found to have a regulatory role in the expression of CUP-SHAPED COTYLEDON1 (CUC1) and CUC2, which encode key transcriptional regulators involved in organ boundary specification (Huang et al. 2012). These previous findings showed that the gene found to be targeted by miR164a/b/c could play an essential role in fiber development.
RT-qPCR validation of the selected genes within the SDRs of chromosome D 5 02 and D 5 07 under drought and salt stress conditions Thirty genes were profiled on the leaf tissues of the four parental lines under drought and salt stress conditions. The genes exhibited three types of expressions across the four parental lines; however, more genes were found to be highly upregulated in the leaves of G. klotzschianum and G. thurberi compared with G. davidsonii and G. trilobum (Fig. 8a-d). The results obtained were in agreement to previous findings which have shown that G. thurberi is more tolerant to both biotic stress conditions, more so to Verticillium dahliae which is a fungal pathogen causing Verticillium wilt, a terminal disease to various crops (Dong et al. 2019). Moreover, Cai et al. (2019) revealed that G. thurberi was highly tolerant to cold stress compared with G.trilobum. Furthermore, Kirungu et al. (2018) found that G. klotzschianum harbored more beneficial traits compared with G. davidsonii.

Discussion
Genetic maps have become significantly important in understanding markers, breeding, association genetics, map-assisted gene cloning, gene mining, and mapping of quantitative trait loci (QTLs) (Golestan Hashemi et al. 2015). In our study, we integrated two genetic maps from the D genome of the diploid cotton with a mapping size of 188 F 2:3 population. The first genetic map (Map A) was composed of a genetic cross between G. klotzschianum (female parent) and G. davidsonii (male parent) while the second genetic map (Map B) was developed from G. thurberi (female parent) and G. trilobum (male parent). Map B had a higher number of markers linked and a smaller average distance as compared with map A. This map could play a fundamental role in the analysis of QTLs. In the construction of the consensus map, more markers were contributed by map B as compared with map A. Inconsistencies of marker order including the translocation or inversions between individual markers in consensus maps were observed especially on markers that were closely linked together in the SDR of Chromosome D 5 02-2. Similar results were observed in the consensus map of flax seed (Cloutier et al. 2012).
The segregation distortion among the three maps ranged from 15.8% (map A) to 22.2% (map B). Segregation distorted markers have previously been studied in various plants (Takumi et al. 2013). The study of segregation distortion is significant because distorted markers may be linked to genes or traits of interest, these genes may be beneficial or lethal to organisms. Therefore, it's important to include the segregation distortion markers in the construction of genetic maps since the exclusion of such markers could cause bias of the data and result in the loss of significant genetic information. In our study, we examined the trend of segregation distortion within Chromosome D 5 02 and D 5 07. We observed that both chromosomes had higher segregation distorted markers. Chromosome D 5 02 had the least mapped markers with a higher percentage of segregation distortion ranging from 42.9% to 76.1% in the three genetic maps. Similar results have been observed in cotton (Li et al. 2007;Khan et al. 2016;Shang et al. 2016). The two chromosomes also showed SDR which was skewed towards a specific allele. These SDRs may be due to preor post-zygotic selection and chromosome loss or rearrangements.
We observed that 29 genes were not disrupted by introns (intronless), intronless genes contain a single exon and do not contain introns from its beginning to the end neither in its UTR or CDS regions (Yan et al. 2016). The intronless genes are known to promote the efficiency of transcription initiation and elongation in spliced genes (Sakharkar et al. 2006). Their isoelectric point (pI) values ranged from both acidic to basic proteins. The pI values are known to affect the solubility of protein molecules; hence proteins are less soluble when the pH of the solution is at its isoelectric point (Dawes et al. 1994). All of the proteins were observed to have a GRAVY value less than zero, indicating that they were hydrophilic. Hydrophilic proteins have a high solubility, hence these proteins could be playing a significant role in desiccation tolerance (Hundertmark and Hincha 2008), and also aid in enzymatic activities involved in the biochemical processes.
The analysis of the genes mined within the SDR of chromosome D 5 02 and D 5 07 revealed that the dominant froup was the Pkinase gene family, with a Pfam number of PF00069. There were so many genes within this group, it was technically impossible to analyze all of them, and thus, we determined the dominant subfamily, and further analyzed two of them. The two major dominant subfamilies were the probable kinases and the serine/threonine kinases genes. These groups have been widely studied in both plants and animals (Jun et al. 2015). In the cotton plant, overexpression of GbRLK, a putative receptor-like kinase gene, has been found to confer tolerance to Verticillium wilt, a plant disease that is known to cause massive losses in cotton production (Jun et al. 2015).
Similarly, overexpression of the GbRLK gene isolated from G. barbadense has been found to confer drought and salt stress tolerance in transgenic Arabidopsis plants (Zhao et al. 2013). The detection of these genes within the SDRs demonstrates the significant role played by the SDRs in the evolution or synthesis of vital proteins with a profound role in enhancing tolerance levels of plants to various abiotic and biotic stress factors. The main genes found to be located within the SDR in the consensus map were the R genes. This group of genes is known to play an integral role in signaling during pathogen recognition, hence assist in the activation of plant defense mechanisms.
The R genes work in coordination with other groups to bring combinatorial variations in signal response specificity to pathogens. Moreover, the R genes are mainly associated with those encode proteins that identify specific pathogen effectors, known as avirulence proteins, which specific in terms of their activities. These genes are known to have a gene-to-gene interaction between an organism and its pathogens (Rouxel and Balesdent 2010). These genes were segregating within the SDR in synchrony intending to help in plant defense mechanisms, these mechanisms are involved in a series of enzymatic activities within the proteins. From the recent analysis, it has been observed that the proteins encoded by resistance genes (R genes) display modular domain structures and require several dynamic interactions between specific domains to perform their function , hence a very close interaction and coordination in terms of the activities of the genes located within the SDR. In a study conducted on determining significant QTLs for drought stress tolerance, the majority of the marker loci co-localized with known QTLs for blast tolerance or NBS-LRR disease resistance genes were located within the regions of significantly distortion levels (Dixit et al. 2014). Similar result was found on Bangladeshi rice landrace Capsule in relation to salt stress tolerance (Rahman et al. 2019). The four parental lines used in the construction of the genetic map are known to contain traits for resistance to bacterial blight (G. davidsonii), sucking pests such as aphids (G. klotzschianum), Fusarium wilt, silver leaf whitefly and cotton bollworm resistance (G. thurberi), Verticillium wilt (G. trilobum). This explains the reasons for a large number of plant resistant genes (R genes) detected within the SDRs in chromosome D 5 02 and D 5 07.
The carrying out insilico analysis of the genes obtained within the SDRs, the cis-regulatory elements, miRNA and GO analysis showed that the R genes could play a significant role within the plant. Recent evidence indicates that plant miRNAs play a role in biotic and abiotic stress responses (Sunkar et al. 2007). In the analysis of the genes obtained within the SDRs, several miRNAs were found to target several genes; for instance, miR157a and miR157b were found to target a single gene Gorai.007G063800, a member of the serine/threonine-protein kinase. The same miRNA family was found to be the most abundant, followed by miR156, miR166, and miR168, with variation within each family in Pomegranate. This fruit has enormous importance in human health mainly because of its antioxidant properties, it does accumulate a high amount of anthocyanins in skin and arils (Saminathan et al. 2016). The antioxidant enzymes are important to plants in reducing the deleterious effects of reactive oxygen species (ROS). When plants are exposed to stresses, the production and elimination of the ROS process altered leading to excessive accumulation of ROS within the cell resulting in oxidative stress. The association of miR157 to induction of antioxidant enzymes, showed that these genes within the SDR are critical for plants.
The various cis-regulatory elements (CREs) targeting the genes within the SDRs, were found to perform a myriad of CREs with diverse functions. More specifically, it is geared towards enhancing plants tolerance to various environmental stresses; for instance, ABREATCON-SENSUS targets not only the stress-responsive genes but also those involved in transportation such as the nitrate transporter (NRT) genes as evident in poplar plant (Aichi et al. 2006;Bai et al. 2013). The results obtained for the CREs were further augmented by GO annotation. The various genes obtained within the SDRs were found to play an integral in all the three GO functional annotations. In cellular component (CC), functions such as an integral component of membrane (GO: 0016021), cortical microtubule (GO: 0055028) among others were detected. The integrity of the cell membrane is important because the membrane is the communicating channel between intra and extracellular environments, and any damage to the cell membrane affects various biological processes such as osmosis, thus affecting cell water retention. The detection of these cellular component roles showed that the genes found in the SDRs have a function in maintaining cell membrane stability, and therefore enhancing the delicate osmotic balance within the cell. Moreover, an integral component of the membrane was a function found to be unanimous with the LEA genes (Magwanga et al. 2018b).
Several gametophytic and zygotic barriers causing deviation of allele frequencies from Mendelian ratios have been reported in several plants such as rice ). Therefore detection of SDRs in the two populations developed from the two wild parental lines is a common feature more so among the F 2:3 populations. It is assumed based on Mendelian law that there is an equal probability of transmission of alleles from either parent during sexual reproduction, but this has not been the case in several studies, being there tend to be phenomena referred to as the preferential transmission of alleles or genotypes known as segregation distortion (SD) (Nadeau 2017). The evolution of segregation distortion may have profound evolutionary implications. From previous studies the bulk pollen sequencing indicated a rapid evolution of segregation distortion (Corbett-Detig et al. 2019). SD has been described as powerful evolutionary tools that could lead to speciation (Liberman and Feldman 1982). SDR has been observed not only among the controlled population but also among the natural population (McLaughlin and Malik 2017). The results from the two maps and their consensus showed that SDs are a common feature in segregating population and could be used to mine genes of significance that could be introgressed into the already cultivated species.

Conclusions
The use of genetic map analysis has become increasingly significant in understanding markers-assisted selection, gene mining and cloning. However, intensive investigation of genes located within the SDR has not been widely studied. In our research we examined the only two interspecific maps developed in the D genome of the diploid cotton. We constructed a consensus map from the two genetic maps and noted that in all the three maps D 5 02 and D 5 07 had the highest of SD, and hence we mined the genes within the SDR of D 5 02 and D 5 07 to find out if there were genes of significance that could be segregating within this region. A total of 2 308 genes in D 5 02 and 3 730 genes in D 5 07 were mined within the SDR, these genes were divided into 1 117 groups of which 622 groups were shared between the two chromosomes. We further observed that the 12 largest domains had a significant role in the plant defense mechanism of which 9 belonged to the resistance genes (R group of genes), with 188 genes and a pfam number of PF00069. We analyzed for the properties of these genes, the largest subgroup encode the serine/threonine-protein kinase. The genes that performed similar roles clustered together within the SDR. These genes have similar feature being hydrophilic. The study of these genes will provide an understanding of the significance of genes within the SDR and the role of the consensus map in mining these genes.