Comparative studies on seed protein characteristics in eight lines of two Gossypium species

Background: In order to achieve the targets aiming at the improvement of protein quality, knowledge regarding seed protein fractions and polypeptides constituting them in different crops is essential. Besides having high nutritional value as animal feed and human food, the protein isolates from cottonseed meal have also been proven promising as industrial raw materials for a number of applications. As far as Indian work on the characterization of cotton seed proteins is concerned, relatively meagre reports are available. Keeping in mind the importance of cotton seed proteins, lines belonging to Gossypium arboreum L. (Indian cotton) and G. hirsutum L. (American cotton) which are grown in all the major cotton growing states in India were selected for analysing their seed protein characteristics. Results: Whereas G. arboreum (A-genome) lines revealed a lower range of seed protein content i.e. 19.5~24.3%, an upper range (21.8~29.5%) could be observed in lines of G. hirsutum (AD-genome). Globulins represented dominating fraction in both species followed by albumins, glutelins and prolamins. A significant positive correlation between albumins/globulins and seed protein content in G. arboreum /G. hirsutum, respectively, was observed. Intraspecific electrophoretic variation in seed protein extracts was observed in the region of molecular weight 22 kDa 27 kDa in lines of both the species; however some lines with A-genome showed similarity in banding pattern with AD-genome. Four polypeptides with disulphide-linkages were also reported for the first time. Albumins were observed to reveal more variations in their electrophoretic pattern between the lines of two species followed by globulins. Conclusion: On the basis of present and previous studies, screening the lines with low or high protein content will lead the selection of lines with superior polypeptide fraction important for nutritional and industrial purposes. On comparing the composition and behaviour of four 2-S linked polypeptides with other plant groups, these were suggested to be legumin-like in nature. The similarity in banding patterns between the lines of A-genome and AD-genome species marked towards the close evolutionary relationship between these two. Albumin fractions on the basis of our results could be taken for cultivar differentiation in cotton crop.


Introduction
Out of 50 species, there are four cultivated species of cotton, viz. Gossypium arboreum L., G. herbaceum L., G. hirsutum L. and G. barbadense L. (Wendel and Albert 1992). As former two species having A-genome (2n = 26) are mainly grown in Asia, these are termed as Asiatic cotton or Indian cotton. Remaining two are allotetraploid (4n = 52) with AD-genome, out of which G. hirsutum is known as American cotton and G. barbadense is Pima or Egyptian cotton (Grover et al. 2015). G. hirsutum alone contributes 90% of the total global production of the cotton (Turley et al. 2007). Cotton is an important fibre crop cultivated in tropical and subtropical regions of the world. China, USA and India are the world's major cotton-producing countries, accounting for nearly 60% of world production (Yu et al. 2012;Fang 2015).
Cottonseeds in the form of whole cotton seed (WCS) and cottonseed meal (CSM) constitute the main source of oil, meal and protein for human consumption, livestock feed, and raw material for industrial applications, respectively (He et al. 2013, He et al. 2014a). According to Coppock et al. (1987), the nutritional protein degradability of CSM is similar to that of peanut meal, canola meal and soybean meal for lactating dairy cows, and to that of canola meal and soybean meal for young calves. Production performances of ruminant animals in terms of body weight gain, milk production, the fat content of the milk, wool production had been shown to improve when their diet was supplemented with WCS and CSM (Osti and Pandey 2006). Cottonseed protein food products have also been proven as a healthy addition to the diets of children, college-age women, and elder people. Baked goods, snack food, pet and livestock feed are just a few successful products developed utilizing cottonseed protein (Alford et al. 1996).
As far industrial applications are concerned, promising results of cottonseed protein isolates (CSPI) as well as protein extracted insoluble residue (CSIR) from the cottonseed meal have been revealed in the number of value-added products, viz. bio-based wood adhesive (Cheng et al. 2016a, b), bioplastics and films (Yue et al. 2014), and superabsorbent hydrogel (Zhang et al. 2010). The studies comparing the superiority of cottonseed protein-based wood adhesive over soy protein-based adhesive attributed this to the difference in the structural and functional properties of the proteins between these two crops (He et al. 2016a, b;Cheng et al. 2016a, b). Further, the effect of low and high seed protein content on the adhesive strength was also shown by Pradyawong et al. (2018) who suggested the use of the CSM with high protein content for good adhesive strength and spreadability. Martinez (1964) had divided cotton seed proteins into two categories as: 1) water-soluble proteins having low molecular weight but high electrophoretic mobility, and 2) water-insoluble as true storage proteins with high molecular weight but low electrophoretic mobility. However, these were not classified as albumins, globulins or other classes of storage proteins at that time. Later on, the classification of these proteins as water-soluble albumins followed by alkali-soluble glutelins, salt-soluble globulins and alcohol-soluble prolamins was proposed by Sammour et al. (1995). Youle and Huang (1981) recognized three major types of proteins in cotton having sedimentation coefficient 2S, 5S and 9S in equal amount. Whereas 2S represented albumins, and 5S and 9S belonged to globulin proteins. In the same year, Dure and Chlan (1981), using SDS-PAGE and immuno-technique, purified and characterized the principal storage proteins in cotton having molecular weight 52 kDa and 48 kDa and designated these as α-globulins and β-globulins, respectively. In contrast to principal salt-soluble storage proteins (globulins) as reported by Dure and Chlan (1981), King and Lefler (1979) concluded alkali-soluble proteins to be the major storage proteins in cotton seed. However, the molecular weights in both the cases for principal storage proteins were almost same, i.e. 52 kDa and 48 kDa, and 52.7 kDa and 46 kDa, respectively. The fate of other proteins again remained unexplored. In the sequence of this protein work, Marshall (1990) reported 98 KDa polypeptides with its two subunits 54 kDa and 48 kDa as 7S globulins of cotton; however, these two subunits were found to contain no covalent linkage between them. All the abovementioned studies on seed proteins were mainly carried out in G. hirsutum L. cultivars, and concentrated on major storage proteinglobulins. Still, very negligible work is there in other cultivated species of cotton in terms of investigating the components of seed proteins. By working on another cultivated species -G. barbadense L., Sammour et al. (1995) have reported a high percentage of alkali-soluble proteins than salt-soluble proteins. Further, using two-dimensional gel electrophoresis, they revealed the presence of a 2S-bonded polypeptide of molecular weight 45 kDa in total seed protein extract of cottonseeds but did not assign it to any protein fractions. Protein identification and genetic characterization of high abundance proteins in seeds of three Gossypium species belonging each to AD-genome, A-genome and D-genome have revealed two major families of globulin seed storage proteins, i.e. vicilin and legumin accounting for 60~70% of cotton seed proteins (Hu et al. 2011). A recent study on water-and alkali-soluble cottonseed proteins  in G. hirsutum exhibited the presence of 6 and 12 major protein bands on SDS-PAGE belonging to CSPa and CSPw, respectively. Among these proteins, the most abundant peptides were shown to be legumin and vicilin types. Cotton is a major crop of India after wheat and paddy but as far as Indian work on the characterization of seed proteins in cotton is concerned, relatively meagre reports are available; however, all the four cultivated species are grown here in India. A correlation study by Pandey and Thejappa (1975) on 97 varieties of cotton exhibited a negative correlation between protein and gossypol content. Goyal (1992) utilized cotton seed proteins as a marker to generate electrophoregram for identification of some cotton cultivars. Similarly, Kumar et al. (2007) also used SDS-PAGE technique for identification and genetic diversity estimation of six tetraploid and two diploid cotton cultivars using seed protein profile. Zymograms generated by using globulin protein fractions from three cotton hybrids and their parents were employed to test the genetic purity of the seed by Reddy et al. (2008).
Keeping in view the importance of cottonseed meal (CSM) as animal feed, human food and in industries, it has been suggested from time to time to evaluate the chemical composition of cottonseeds especially for proteins and dietary fibres. In addition, to utilize one plant species successfully in a breeding programme for improving some quality traits, genetic relationships among different species of the same genus as well as among different groups of the plants have also been evaluated employing protein characterization studies. Most of the studies on cottonseed proteins are concentrated on G. hirsutum, and explored mainly globulin fractions; very little reports are there on the characterization of other seed protein fractions which could be helpful in screening and better utilization of the cotton germplasm for nutraceutical and industrial end use. In this regard, the protein characterization studies on Indian cotton germplasm are very scanty. Varieties belonging to G. arboreum (Indian cotton) along with G. hirsutum (American cotton) are cultivated in all the major cotton growing states in India under varying environmental conditions. Therefore, it was planned to work out the seed protein analysis in G. arboreum in relation to the most widely studied species -G. hirsutum by studying variation in protein content, the proportion of four protein fractions and polypeptide patterns of eight lines of G. arboreum and G. hirsutum.

Materials and methods
Seeds of the following eight lines belonging to two Gossypium species, i.e. G. hirsutum (American cotton) and G. arboreum (Indian cotton), were used for the present study: The seeds were kindly supplied by Punjab Agricultural University Regional Station, Bathinda (Punjab), India.

Total seed protein extraction
Preparation of total seed protein extracts was based on the method employed by Singh and Matta (2011). Total seed protein extracts were prepared by mixing the defatted seed meal in Tris-HCl buffer solution (0.2 mol·L − 1 , pH 6.8) containing 2% SDS. Forty mg of seed meal was suspended in 400 μL buffer solution heated at 80°C in a water bath for 45 min. The contents were centrifuged at 2 000 g for 10 min and supernatant used for analysis. Glycerol was added to the sample meal containing the extracts so that it amounted to 10% of the final volume. To run the proteins under reducing conditions, 2-mercaptoethanol was added to limit its concentration to 2% in the total protein extracts.

Seed protein fractionation
Separation of four protein fractions was based on methods employed by Luthe (1983), and by Schaeffer and Sharpe (1990), with slight modifications. All aqueous extraction solvents were buffered with 10 mmol·L − 1 Tris-HCl (pH 7.5). After extraction of albumins in water, the residue was used for separation of globulins with 0.5 mol·L − 1 NaCl, followed by 55% n-propanol for prolamins and 0.5% SDS for the glutelin fraction. Each extraction was repeated twice and the supernatants were pooled.

Protein estimation
Protein content in the defatted seed meal was estimated by Semi-micro Kjeldahl method as suggested by Peach and Tracey (1956). Seed meal was digested with concentrated sulphuric acid in the presence of a catalytic mixture of copper sulphate, selenium dioxide and potassium dichromate. The digest was heated with 40% NaOH in Markham's distillation assembly and the ammonia so evolved was volumetrically titrated with N/40 HCl to determine the nitrogen present in the sample. The so determined nitrogen was multiplied by a conversion factor 6.25 to get the seed protein content value.
Protein concentration in the four separated fractions was determined using Bradford method (Bradford 1976). A volume of 100 μL of the given fraction representing an extract from 100 mg of seed meal was used and the proportion of each fraction calculated as 'g/100 g seed meal'.

SDS-polyacrylamide gel electrophoresis
SDS-polyacrylamide gel electrophoresis was carried out on 14% gels following the method of Laemmli (1970). For gel electrophoresis under reducing conditions, 2% 2mercaptoethanol was added to the seed protein extracts and the samples were heated in an oven at 90°C for 10 min before loading these onto the gels. The gel was run at 17 mA and after the tracking dye moved down into the separation gel, the current was increased to 25 mA. The gel was stained with Coomassie Brilliant Blue (0.05%) dissolved in a solvent containing methanol, acetic acid and distilled water in the ratio 50:7:43 (v/v), and destained in the same solvent mixture but lacking the dye.

Two-dimensional gel electrophoresis
Two-dimensional gel electrophoresis of total seed protein extract was carried out following the method as described by Singh and Matta (2008). The 1.5 mm thick gel strip with polypeptides separated under nonreducing conditions (1D, −2ME) was equilibrated for 2 h with gentle shaking in 0.2 mol·L − 1 Tris-HCl buffer (pH 6.8) containing 2% SDS and 2% 2-mercaptoethanol, and loaded onto another gel of 2 mm thickness for electrophoresis in the second dimension (2D, +2ME).

Molecular weight determination
To calculate the molecular weight of the bands appeared on SDS-gel, standard curve was drawn according to molecular weight protein markers and their pixel position on the gel using Total lab TL software.

Statistical methods
Mean and coefficient of correlation were calculated by using SPSS 18.0.

Seed protein content
Semi-micro Kjeldahl method was employed to determine the total seed protein content of different lines. The four cotton lines belonging to G. arboreum species revealed the protein content in the range of 19.5% in line 'LD-1019' to 24.3% in line 'LD-327'. On the other hand, the cotton lines belonging to G. hirsutum showed the protein content in the range of 21.8% in line 'F-2228' to 23.4% in line 'LH-2076' ( Table 1).

Proportion of four protein fractions
The relative distribution of four protein fractions, viz. albumins, globulins, prolamins and glutelins in the seeds of different lines is given in Tables 2 and 3. In all the cotton lines of the G. arboreum species, globulins represented the major fraction with their proportion varying from 36.2% in line 'FDK-124' to 63.7% in line 'LD-1019'. These were followed by albumins which were present in the range of 20.8% in line 'LD-1019' to 32.2% in line 'LD-327'. Remaining two fractions -glutelins and prolamins -were found to be present in proportion as varying from 9.1% in line 'LD-1019' to 24.5% in line 'FDK-124' , and 5.4% in line 'LD-949' to 7.8% in line 'FDK-124' , respectively.
Similarly, the lines belonging to G. hirsutum revealed globulins to be the major protein fraction in the range of 33.0% in line 'F-2228' to 40.7% in line 'LH-2076'. It was followed by albumins, glutelins and prolamins as in G. arboreum. The water-soluble albumins exhibited their proportion varying from 23.3 to 29.7% in lines 'F-2383' and 'F− 2228' , respectively. Prolamins, the lowest in proportion in all the lines, were observed to be present in the range of 5.5% in line 'F-2383' to 12.0% in line 'LH-2108'.

Electrophoretic variation studies Total seed protein extracts
The polypeptide patterns of total seed protein extracts of each set of four cotton lines belonging to G. hirsutum and G. arboreum as analysed on SDS-gels under reducing conditions can be seen in Fig. 1a, b.
G. hirsutum lines: A large number of polypeptides with molecular weight ranging between 10 kDa to 122 kDa were observed and can be seen in Table 4. The major polypeptides of 57 kDa, 55 kDa, 50 kDa, 47.5 kDa, 18 kDa, 17 kDa, 15 kDa, 14 kDa, 13 kDa and 12 kDa were intense and darkly stained; those of molecular weight 49 kDa, 46 kDa, 40.5 kDa, 38 kDa, 36 kDa, 32 kDa, 27 kDa, 26 kDa, 25 kDa, 24.5 kDa, 24 kDa, 23.5 kDa, 22 kDa, 14.5 kDa, 11.5 kDa and 10 kDa were prominent but of relatively lower intensity. Some other polypeptides of molecular weight 120 kDa, 115 kDa, 80 kDa, 75 kDa, 60 kDa were represented by lightly stained bands. On further comparing the variation in the polypeptide patterns of G. hirsutum lines on SDS-gels, the polypeptide region of molecular weight 22 kDa to 27 kDa was seen with three different patterns (designated as 'P' to 'R') as can be seen in Table 6. The pattern 'P' with molecular weight 27 kDa, 26 kDa, 25 kDa, 24 kDa and 22 kDa was   seen to occur in lines 'F-2228' and 'LH-2076' followed by the pattern 'Q' with molecular weight 25.5 kDa, 24.5 kDa and 23.5 kDa in line 'F-2383'. The third pattern 'R' was seen in line 'LH-2108' with molecular weight 24.5 kDa and 23.5 kDa.

Polypeptides of four protein fractions
For assigning various bands, as seen in the total seed protein extracts to different protein fractions, four protein fractions of two cotton lines, one each from G. hirsutum (AD-genome) and G. arboreum (A-genome) species, were analysed for their polypeptide composition using SDS-PAGE under reducing conditions (Fig. 3). The albumins and globulins which contributed 75% of the four protein fractions were represented by large number of dark and light intensity bands whereas glutelins were seen to possess fewer bands mainly light in intensity. Prolamins which represented just 5~10% of total protein fractions could not appear on the SDS-gel. In both the lines, polypeptides belonging to albumin and globulin fractions could be seen in the range of molecular weight 12 kDa to 120 kDa and 11.5 kDa to 122 kDa, respectively ( Table 7). The comparative analysis of polypeptide pattern of salt-soluble fraction (globulins) revealed the presence of some bands (27 kDa, 22 kDa, 20 kDa and 17 kDa) only in the line of A-genome species which could not be spotted in line of AD-genome species. Similarly, the bands of molecular weight 46 kDa, 27.5 kDa, 26 kDa and 25 kDa were seen only in the line of later but not in former species. Whereas watersoluble albumins of A-genome species exhibited the bands of molecular weight 34 kDa, 30 kDa, 26 kDa and 19 kDa in its polypeptide profile, the bands of molecular weight 45 kDa, 31 kDa, 25 kDa, 15 kDa, 14 kDa and 13 kDa could only be specified to ADgenome species. The polypeptide pattern of the alkalisoluble fraction (glutelins) with molecular weight ranging from 10 kDa to 54 kDa was found similar in the lines of both A-and AD-genome species.

Correlation studies
Coefficient of correlation, calculated between protein content and four protein fractions in lines of both the Gossypium species, could be seen in Tables 8 and 9. A positive significant correlation was found between protein content and albumin fraction in G. arboreum lines. On the other hand, globulins and glutelins were observed to be negatively correlated in this species. In lines of G. hirsutum, a significant and positive correlation could be seen in protein content and salt soluble globulins. However, unlike G. arboreum lines, glutelins exhibited negative correlation with the albumins.

Discussion
Besides fibre, a major product obtained from the cotton plant, other by-products like cottonseed meal (CSM), cottoseed hull (CSH) and cottonseed oil also possess some good nutritional values. Whereas CSH is a conventional feed for cattle and rich in cellulose, CSM is an important protein source for the ruminants (Osti and Pandey 2006). The major limitation associated with cottonseed protein is the presence of polyphenolic toxic compoundgossypolwhich can form a covalent linkage with the epsilon group of lysine and arginine, thus, reducing the quality of protein (Price et al. 1993). In order to achieve the targets aiming at improving the seed protein quality, knowledge regarding its protein fractions and polypeptides constituting them in different crops is essential. Information on seed storage proteins and nutritional quality in cotton is available through the work mainly carried out by foreign scientists, though a few reports on cotton proteins from Indian labs are also available (Goyal 1992;Kumar et al. 2007;Reddy et al. 2008).
The present work involved the analysis of eight lines of two Gossypium species -G. hirsutum and G. arboreumfor variation in their seed protein content, four protein fractions, polypeptide patterns on SDS-gels. Seed protein content was estimated between 19.5 to 24.3% and 21.8 to 29.5% in G. arboreum and G. hirsutum lines, respectively, in contrast with the other workers who reported it in the range of 30 to 45% (Church 1991;Mujahid et al. 1999). The variation in protein content between the lines of two Gossypium species seems obvious due to the difference in their ploidy levels; the lines belonging to a particular species also displayed a fair variation in seed protein content within these. The mature seeds of these lines, procured from Punjab Agricultural University in January 2019, were the harvest of previous crop (May to November, 2018). So, the variation in the seed protein content within the lines of the same species may attribute to genotypic differences in the varieties which may lead to varied gene expression for the seed proteins under same growing conditions. Moreover, the physiological and morphological changes during seed development like fibre development and nutrient mobilization (mainly nitrogen) from leaves (source) to seed (sink) in cotton plant have also been shown to affect the final seed protein content in mature seeds (Bellaloui et al. 2015). Different cropping management practices like rate and time of fertilization treatments,    * indicates the coefficients are statistically significant at P < 0.05. Pair with positive correlation coefficients and P < 0.05 tend to increase together. For the pairs with negative correlation coefficients and P < 0.05, one variable tends to decrease while the other increases. For pairs with P > 0.05, there is no significant relationship between the two variables use of cover crops, plant density and plant growth regulators could be the detrimental factors for the accumulation of the seed protein content (He et al. 2014;Yang et al. 2016). So, it may be drawn that the variation in protein content in the germplasm of a crop is the result of the cumulative effect of genotypes used, growing/ environmental conditions, cultural practices, and methods used for its estimation. Among the four protein fractions, separated on the basis of solubility criteria, globulins represented the major fraction with the proportion varying from 36.2 to 63.7% and 33.0 to 51.5% in G. arboreum and G. hirsutum lines, respectively. These were followed by albumins (20.8~32.2% and 23.3~29.7%) and glutelins (9.1~24.5% and 19.6~28.0%), respectively, in G. arboreum and G. hirsutum lines; prolamins being the lowest in both the lines. The pattern of distribution of albumin and globulin fractions in the seed protein was found in contradiction with the results of Gandhi et al. (2017) and Sammour et al. (1995) who reported albumins with 40 to 50% as the major dominating fraction followed by globulins with 21 to 42% in lines of G. herbaceum and G. barbadense. Like seed protein content as discussed earlier, the proportion of protein fractions and their individual polypeptides could also be altered under different conditions of genetic set up, plant nutrients and other growth conditions. The time period over which and the efficiency with which the genes for these protein fractions are expressed in the developing seed represent the important factors for variation in the ratio of different protein fractions in mature seeds. The existing difference in the proportion of seed protein fractions in the present study with other workers might be the result of different species involved, and also due to different extraction solvents as well as protocols followed.
The correlation studies carried out in any domain of biological research are always helpful in understanding the relationship between the genes governing two characters, whether they are linked or not. In this way, selection and screening for one character would indirectly work for the selection of another character. A positive significant correlation was revealed by us between protein content and albumin fractions in G. arboreum lines, and similarly between protein content and salt soluble globulins in G. hirsutum lines. On the other hand, the proportion of glutelins had a significant negative correlation with the proportion of albumins and globulins in Indian cotton and American cotton, respectively. Similarly, a negative correlation between seed protein content and gossypol content -a limiting factor in nutritional quality as mentioned earlier, has been shown by Pandey and Thejappa (1975). Recently, a study has demonstrated a high positive correlation of seed protein content with the lint yield and fineness but at the same time negatively correlation with seed oil content in cotton (Cambell et al. 2016). In this way, considering all these correlation studies including the present study, it could be recommended to select the cotton lines with higher protein content which may prove better in term of improved agronomical (high fibre quality) and nutritional qualities (reduced gossypol content). Further, the lines with low protein content could also be selected as a possible source for producing the value-added product like bio-based wood adhesives as these lines will be having high percentage of water-insoluble/alkali soluble fraction (WIF) which has been suggested as potential protein fraction imparting this quality to the CSM (He et al. 2014).
In order to understand the genetic structure for ultimate applications in breeding programmes, studies on genetic diversity and relationships of the crop cultivars within the species, and with other species have been routinely emphasized. In continuation with this, comparison of polypeptide patterns for any variation and similarities in the lines of both the Gossypium species on SDS-gels was carried out. Occurrence of intraspecific variation in molecular weight region 22 kDa to 27 kDa with three banding patterns 'P' , 'Q' and 'R' (as described in results) in G. hirsutum lines and two patterns 'P' and 'Q' in G. arboreum point towards the different rates of evolutionary mechanisms for the genes undergoing various changes; these changes being independent of the genes for polypeptides of one region to the genes for polypeptides of the other region. However, no other regions on SDS-gel were found showing variation in the polypeptide patterns. On further comparing the polypeptide pattern between lines of two species, the presence of similar banding pattern in some A-genome species with ADgenome species marked towards the close evolutionary relationship between these two species. The similar kind of study showing interrelationships among 18 Oryza species using seed proteins as markers has also been published by the author emphasizing the importance of these proteins in establishing the evolutionary relationships among other crop species also (Singh et al. 2018).
Further analysis of the polypeptide patterns of four seed protein fractions from the lines of these two species, specify the presence of some polypeptide bands of a fraction to the line of a particular genome group (A-genome) but absent in another (AD-genome). As G. arboreum (Diploid A-genome species) has been suggested as one of the progenitors of allotetraploid AD-genome species, i.e. G. hirsutum (Hu et al. 2011), the interspecific variation in expression patterns of some of the seed protein fractions between lines of these two species might be the result of interaction between the contributing genomes. The rapid adjustment to duplicated genome dosage is most probably through the control of gene expression which may be due to gene silencing or gene activation. Transcriptome study has shown the absence of a storage protein subunit in the genome of wheat hexaploid species which otherwise was present in its tetraploid species, and it was explained as the result of inter-genomic suppression of transcript for that particular seed protein subunit after introgression of new diploid genome in allotetraploid genome (Kashkush et al. 2002). Similarly, the differences in the level of gene expression in diploid and its allotetraploid Gossypium species, manifested in term of developmental, biochemical changes, etc., could be due to unpredictable gene interaction at the time of genome merging, genome duplication and duplicate gene evolution. These phenomenon may further lead to favouring of one of the genome (A-or D-genome) more, suppressing other or equivalent expression of homeologous genes from both the parental species in allotetraploids as explained by number of workers (Yang et al. 2006;Flagel et al. 2008;Flagel and Wendel 2010). In this way, in the present study, the absence of some of the polypeptide bands of albumins and globulins fractions of diploid A-genome from allotetraploid AD-genome species attribute to the genome biasing during the process of allotetraploidization which favoured the selection of D-genome genes over A-genome or suppression of Agenome genes for these polypeptides by D-genome. The similar kind of polypeptide pattern for alkalisoluble protein fractions (glutelins) in lines of both diploid (A-genome) and tetraploid (AD-genome) species indicate towards the equal expression of the A-and D-genome or the dominance of A-genome over the Dgenome for this protein fraction. So, it will be of interest to carry out further studies by examining protein profile and transcript levels of individual seed protein fractions from geographically varied accession of both the diploid (A-and D-genome) and allotetraploid (AD-genome) species for better understanding of the value of gene expression.
Two-dimensional gel electrophoresis in which polypeptides are separated according to their apparent molecular weight under non-reducing conditions in the first dimension, followed by reducing conditions in the second, has proven a valuable tool for analysing the occurrence of any disulphide-linked polypeptides in seed protein extracts. In the present study, a large number of bands with molecular weight 122 kDa, 120 kDa, 115 kDa, 80 kDa, 75 kDa, 60 kDa, 48 kDa, 40.5 kDa, 38 kDa, 27 kDa, 26 kDa, 25 kDa, 18 kDa, 16 kDa, 15 kDa, 14.5 kDa, 14 kDa, 12 kDa, 11.5 kDa and 10 kDa resolved as spots at the same molecular weight positions, as under non-reducing conditions on SDS-gel, along the diagonal on 2-D gels, thus, lacking any disulphide linkages. Meanwhile, the spots occupying the positions below the diagonal could also be seen arising due to the reduction of different kinds of disulphide-linkages in some other polypeptides. The bands with molecular weight 52 kDa, 40 kDa, 36 kDa and 32 kDa were observed to be having inter-polypeptide disulphide-linkages. Like major 11S globulin sub-fraction in pea family as well as in Cucumis (Matta et al. 1981;Singh and Matta 2008), two polypeptides with molecular weight 52 kDa and 40 kDa were shown to consist of heterodimeric subunit pairs with heterogeneity in their respective subunits. Previously, Sammour et al. (1995) have reported only one such type of polypeptide band of molecular weight 45 kDa with disulphide-linkages in total seed protein extracts without assigning it to any protein fraction. Similarly, recent protein characterization studies on Gossypium species have reported 60~70% of the total seed proteins belonging to vicilin and legumin families; the later contributed more. These workers have also shown more heterogeneity in the molecular weights of the legumin A subunits (30 kDa, 17~20 kDa and 11~12 kDa) and less in legumin B subunits (11~13 kDa) (Hu et al. 2011;He et al. 2018). Further analysis indicated the polypeptide of molecular weight 58 kDa as a precursor of legumin A subunits. Our results using 2-D gel electrophoresis also exhibited the generation of peptide subunits in the range of molecular weight 17-32 kDa (equivalent to range of legumin A type subunits), from the precursors 52 kDa, 40 kDa, 36 kDa and 32 kDa, as spot below the diagonal. So, these legumin-like polypeptides in the previous studies and the present study may be suggested as the component of globulin fractions of seed storage proteins in Gossypium.

Conclusion
The work was carried out to explore the seed protein characteristics in diploid and allotetraploid Gossypium species. The end use of cottonseed, for nutritional and industrial purposes, depends upon its seed protein quality. On the basis of our and other workers' studies, it may be stated that the selection of cotton lines with low protein and high protein content will ultimately help in selecting the better lines with improved quality protein seed fraction important in industrial and nutritional uses, respectively. The combined approach of proteomics as well as trancriptomics, involving the accessions of both diploid (A-and D-genome) and allotetraploid (AD-genome) species representing wide geographical areas, could be applied to fully understand the mechanism of differential gene expression for each seed protein fractions in diploids and their allotetraploids species. The percent homology of four legumin-like subunits reported in our study with the legumin subunits further, could be confirmed by purification, sequencing and comparing the peptides for these in databases. The albumin fractions exhibiting maximum variation between lines of two species could be used for diversity analysis in cotton cultivars.