Characterization and analysis of GC-TOF/MS data
Metabolomics analysis was employed to elucidate the potential chemical basis of different species of the Gossypium genus. Three representative cultivars of cotton species, Shixiya1, Hai7124, and TM-1, were used for untargeted metabolome analysis by GC-TOF/MS, each with six biological replicates. A typical total ion chromatogram (TIC) from these cottonseed samples is shown in Additional file 1: Fig. S1, representing the summed intensity of all mass spectral peaks at every point in the analysis. From this step, obvious differences could be found in some TIC peaks across samples. A total of 705 peaks were extracted; these are listed in Additional file 2: Table S1. Based on the local metabolite database, 263 metabolites were identified and classified into 15 different categories (Additional file 3: Fig. S2, Additional file 4: Table S2), including 53 organic acids, 44 carbohydrates, 43 amino acid derivatives, 34 other metabolites, 25 alcohols and polyols, 14 lipids, 10 benzene and substituted derivatives, 9 nucleotide and its derivates, 7 amines, 6 phenylpropanoids, 6 flavonoids, 3 sphingolipids, 2 quinates and their derivatives, 6 alkaloids, and 1 vitamin. The above results indicate that many types of metabolites were obtained, consistent with the characteristics of metabolomics data based on GC-TOF/MS.
Identification of differential metabolites from three cotton cultivars
Principal component analysis (PCA) is the most common dimensionality reduction method. It converts a large number of variables into principal components (PCs) that still contain most of the information found in the large set. PCA is an unsupervised mode; thus, it shows the distribution of the origin data and is used to evaluate the difference for intergroup comparisons. As illustrated in Additional file 5: Fig. S3, the PCA score scatter plot indicated significant differences in the three cotton cultivars, and all the samples were within the 95% confidence intervals (Hotelling's T-squared ellipse). PC1 and PC2 accounted for 29.1% and 14.7% of the total variation, respectively.
The orthogonal partial least square-discriminant analysis (OPLS-DA), as a supervised mode, was conducted to provide a precise level of group separation, as well as to test the correlation of samples. It is apparent from Fig. 1A that, OPLS-DA was better for separating the differences in the metabolic phenotypes between three varieties of Gossypium. Furthermore, sevenfold cross validation and permutation tests were used to estimate the effectiveness of the model. The R2 value was close to 1 and the Q2 value was negative, indicating that the model was reliable and that the risk of overfitting was low. These results suggested that there were significant differences in the metabolites of each cultivar.
Subsequently, cutoffs for variable importance in the projection (VIP) ≥ 1 obtained from OPLS-DA and P value < 0.05 were used for screening the differential metabolites (DMs). A volcano plot was employed to visualize the differential metabolites (Additional file 6: Fig. S4). The results showed that there were 119, 131, and 128 DMs between the two cultivars, respectively. Specifically, the Shixiya1/Hai7124 comparison had 64 upregulated (red scatter points) and 55 downregulated (blue scatter points) DMs (Additional file 6: Fig. S4A, Additional file 7: Table S3). The Shixiya1/TM-1 comparison had 29 upregulated and 102 downregulated DMs (Additional file 6: Fig. S4B, Additional file 8: Table S4) and the Hai7124/TM-1 comparison had 26 upregulated and 102 downregulated DMs (Additional file 6: Fig. S4C, Additional file 9: Table S5). In addition, one-way ANOVA was also conducted for the comparative analysis of metabolic differences between multiple experimental groups, and hierarchical cluster analysis (HCA) of the selected differential metabolites (P value < 0.05) was used to classify metabolites with similar characteristics and further study intergroup variation (Fig. 1B). The relative contents of metabolites represented by colored segments at the corresponding locations are showed in Additional file 10: Table S6. We observed a significant enrichment of metabolites in TM-1 compared to Hai7124 or Shixiya1. Then, we classified these metabolites and found that they included carbohydrates, amino acids, lipids, organic acids and flavonoids. These results indicated that the seeds of G. hirsutum TM-1 may have high levels of these metabolites, and could be used as raw materials for processing and utilization in the future.
The seeds of higher plants can be described as valuable factories capable of converting photosynthetically derived sugars into a variety of storage compounds, such as oils, proteins, sugars, and secondary metabolites (Baud 2018). G. hirsutum is the most widely planted cotton cultivar and makes up the absolute majority of cotton fiber yield and cottonseed byproducts worldwide. Therefore, further nutritional evaluation research on the cottonseeds of TM-1 is needed. There have been many studies and evaluations of primary metabolites in cottonseed, and this study focuses on the evaluation of secondary metabolites proanthocyanidins. Notably, catechin and epicatechin were both detected, catechin significantly accumulated in TM-1, while epicatechin accumulated in Hai7124. Proanthocyanidins are a class of polyphenols produced by the flavonoid pathway, that have important biological functions. Numerous in vivo and in vitro experimental studies have demonstrated that proanthocyanidins are beneficial to human and animal health (Dixon et al. 2005; Gonzalo-Diago et al. 2013). Proanthocyanidins are compounds formed by the polymerization of flavan-3-ol structural unit. And common flavan-3-ol units include catechin and epicatechin. Therefore, we find that the main monomeric forms in TM-1 and Hai7124 are different, which will also lead to the differences in oligomers formed by varying amounts of catechin or epicatechin.
Correlation analysis of catechin and oil-related in core collections of Gossypium
To verify the reliability of the data obtained, we conducted confirmatory studies with different test methods and an increased sample size. A classic chromatographic target HPLC method was used to evaluate the content of catechin, and the results indicated that the catechin content in TM-1 was significantly higher than that in other two cultivars, which is consistent with the above analysis (Fig. 2A). However, interestingly, we found that Shixiya1 had slightly higher values than Hai7124, which is different from the GC-TOF/MS results. Similar results have also been verified in core collections of Gossypium (47 accessions of G. arboreum, 37 accessions of G. barbadense, and 144 accessions of G. hirsutum) (Fig. 2B) (Du et al. 2018; He et al. 2021). Based on a comparative analysis of the detection methods, we thus believed that the acid used in the pretreatment method for HPLC could effectively release bound catechins into free catechins, which could be the potential reason behind this phenomenon. In Arabidopsis, the regulatory network of proanthocyanidins (PAs) in the seed coat is mainly controlled by the TT2 gene. Previous studies have shown that TT2 can suppress fatty acid (FA) biosynthesis, and that the tt2 mutation results in significantly elevated seed oil content and an alteration in FA composition (Chen et al. 2012; Wang et al. 2014). Therefore, the correlation analysis was conducted between catechin and oil-related traits to evaluate their relationship in cottonseed. As indicated in Fig. 2C, catechin content has a negative association with myristic acid (C14:0), palmitic acid (C16:0), stearic acid (C18:0), oleic acid (C18:1), linoleic acid (C18:2), arachidic acid (C20:0) and total fatty acid content in the core collections of G. hirsutum (Pearson, P < 0.001). In addition, the results also showed that there was no correlation between the content of catechin and cyclopropene fatty acid (C19:1), an anti-nutritional factor in cottonseed that is used for feed (Pearson, r = 0.095, P > 0.05). Taken together, this work advances our understanding of PAs functions in seed FA accumulation, and provides technical support for the development of special cotton varieties rich in catechins or high-quality fatty acids in the future. Lastly, further research should be undertaken to investigate the other nutrient components for the comprehensive evaluation of cottonseed.