Skip to main content

Feasibility study on the use of near-infrared spectroscopy for rapid and nondestructive determination of gossypol content in intact cottonseeds



Gossypol found in cottonseeds is toxic to human beings and monogastric animals and is a primary parameter for the integrated utilization of cottonseed products. It is usually determined by the techniques relied on complex pretreatment procedures and the samples after determination cannot be used in the breeding program, so it is of great importance to predict the gossypol content in cottonseeds rapidly and nondestructively to substitute the traditional analytical method.


Gossypol content in cottonseeds was investigated by near-infrared spectroscopy (NIRS) and high-performance liquid chromatography (HPLC). Partial least squares regression, combined with spectral pretreatment methods including Savitzky-Golay smoothing, standard normal variate, multiplicative scatter correction, and first derivate were tested for optimizing the calibration models. NIRS technique was efficient in predicting gossypol content in intact cottonseeds, as revealed by the root-mean-square error of cross-validation (RMSECV), root-mean-square error of prediction (RMSEP), coefficient for determination of prediction (Rp2), and residual predictive deviation (RPD) values for all models, being 0.050.07, 0.040.06, 0.820.92, and 2.33.4, respectively. The optimized model pretreated by Savitzky-Golay smoothing + standard normal variate + first derivate resulted in a good determination of gossypol content in intact cottonseeds.


Near-infrared spectroscopy coupled with different spectral pretreatments and partial least squares (PLS) regression has exhibited the feasibility in predicting gossypol content in intact cottonseeds, rapidly and nondestructively. It could be used as an alternative method to substitute for traditional one to determine the gossypol content in intact cottonseeds.


Cotton (Gossypium. spp) is one of the important industrial and economic crops (Sunilkumar et al. 2006). Cottonseed, the main by-product of cotton production, can be used to produce food, animal feed, and other products. Cottonseed contains many kinds of nutrients, including proteins, oils, fatty acids, and amino acids, making it a potential food resource for human beings with the rapid growth of the global population (Sawan et al. 2006). However, the Gossypium species are characterized by the presence of gossypol, which is toxic to human beings and monogastric animals (Lordelo et al. 2005), such that the utilization of cottonseed products is limited.

Gossypol, 1, 1′, 6, 6′, 7, 7′-hexahydroxy-5, 5′-diisopropyl-3, 3′-dimethyl-(2, 2′ binaphthalene)-8, 8′-dicarbaldehyde, is a terpenoid compound that helps cotton defend biotic stresses (Kong et al. 2010; Blanco et al. 1983). Due to the toxicity of gossypol, breeding for either lower gossypol content in cottonseeds or higher gossypol content in cotton plants has been practiced in many cotton-planting countries. The cottonseed breeding often requires analyzing a large number of cottonseed samples to measure gossypol content. Conventionally, gossypol content is assayed by ultraviolet (UV) spectrophotometry which not only involves reagents with great toxicity but also is inaccurate and unreliable. Despite offering a high level of accuracy and sensitivity, high-performance liquid chromatography (HPLC) is usually costly and time-consuming. In addition, both classical analytical methods cause undesired destruction of the testing samples which frequently needed to be planted in a cotton breeding program. So, a rapid and nondestructive method for gossypol determination is required.

Near-infrared (NIR) spectroscopy combined with chemometrics is a rapid, convenient, and environmentally-friendly analytical technique in the quality analysis for crops (Sohn et al. 2008; Huang et al. 2013; Weinstock et al. 2006; Rosales et al. 2011; Bellato et al. 2011; Bala and Singh 2013; Hacisalihoglu et al. 2010; Mendoza et al. 2018; Lee et al. 2017; Tierno et al. 2016; Yang and Ren 2008; Lin et al. 2013a, 2013b; Kovalenko et al. 2006; Fassio and Cozzolino 2004). Although the NIR calibration model for determining gossypol content in cotton powder was developed (Li et al. 2017), it could not be used to nondestructively analyze gossypol content in intact cottonseeds, especially in breeding programs where the genetic materials from genetic modification or cross-breeding have limited availability. It is a challenge to determine gossypol content in intact cottonseeds by NIR, because (i) cottonseed being bigger than other crop seeds, so large voids are left between packed samples in sample cells; (ii) some immature and wizened cottonseeds can be mixed in the samples, which can introduce irrelevant information into the spectra data; and (iii) the tough and thick shell of cottonseed can impact the penetration of NIR light and result in a lower S/N ratio and poor information. Because of these factors, the spectral data of intact cottonseeds are far more complex than that of other crop seeds, which may contain a large amount of useless and uncorrelated information such as noise and background. To overcome these difficulties, sophisticated chemometric methods are applied to extract useful information from NIR spectra and calibrate robust models for gossypol content in intact cottonseeds. Essentially, these include regression methods such as principal component regression (PCR) (Xie and Kalivas 1997), partial least squares (PLS) (Haaland and Thomas 1988), support vector machines (SVM) (Nie et al. 2008), least squares support vector machines (LS-SVM) (Shao et al. 2012), and artificial neural networks (ANN) (Makinoa et al. 2010), coupled with spectral pretreatments such as standard normal variate (SNV) (Barnes et al. 1989), Savitzky-Golay (SG) smoothing (Savitzky and Golay 1964), multiplicative scatter correction (MSC) (Hopke 2003), and first derivate (Rinnan et al. 2009).

Due to undesired destruction of the test sample, previous NIR models which can be used in the detection of gossypol in cottonseed meal can be barely applied in breeding trails (Li et al. 2017). In this present study, spectroscopy has investigated the feasibility of analyzing gossypol in intact cottonseeds based on NIR spectrometer. The main aim of this study was to establish an optimal model which could provide powerful technical support for cotton breeders and other people who work on cottonseeds.

Materials and methods

Samples and preparation

A total of 268 samples of cottonseeds were collected from different growing areas, including Hangzhou (Zhejiang, China), Xiaoshan (Zhejiang, China), Sanmen (Zhejiang, China), Sanya (Hainan, China), Wuhu (Anhui, China), and Yancheng (Jiangsu, China), in 2012, 2013, and 2014. The cottonseed samples were delinted and dried at 30 °C to constant weight. After spectral acquisition by NIR spectroscopy, the intact cottonseed samples were dehulled and then ground to cottonseed kernel powder for HPLC analysis. The preparations were implemented in the same experimental condition in order to reduce the influence of other physical factors.

Gossypol extraction

A sample of 0.1 g of cottonseed kernel powder was suspended in 5 mL acetone and sonicated in an ultrasonic bath for 45 min. Then, the suspension was filtered through quantitative filter paper followed by filtration with a 0.45 μm syringe filter (Agela, Newark, USA). The sediment was washed three times by acetone. After this procedure, the extract was adjusted to 25 mL using acetone.

HPLC analysis

HPLC analysis was performed on an Agilent 1100 HPLC system (Agilent, Santa Clara, USA), equipped with an auto-sampler and UV detection. A C18 column (250 mm × 4.6 mm, 5 μm, Dikma, Richmond Hill, USA) was employed as the stationary phase. The mobile phase consisted of methanol/0.2% H3PO4 (80/20, V/V). The injection volume was 10 μL and the flow rate was 1.0 mL·min− 1. The UV detector was set at 238 nm and the column temperature was 25 °C. Each sample was measured three times. The limit of detection (LOD) was obtained at a signal-to-noise (S/N) ratio of three and the limit of quantification (LOQ) at an S/N ratio. To detect the stability of gossypol at room temperature, three samples were randomly employed to determine the changes of the peak area within 36 h. HPLC-grade gossypol was purchased from Sigma (Sigma-Aldrich, St. Louis, USA). Methanol (HPLC grade) was procured from Tianjin Chemical Reagent Company (Tianjin, China). Double deionized water was prepared using Milli-Q-water purification system (Millipore, Molsheim, France).

NIR spectra acquisition

The NIR spectra of intact cottonseed samples were scanned with a Büchi Flex-N500 NIR spectrometer (Büchi, Flawil, Switzerland), equipped with a solid sample module as follows. The NIR spectra were collected across the range of 4 00010 000 cm− 1, and were recorded with a spectral resolution of 4 cm− 1. Samples were measured three times on a rotating cylinder device at 25 ± 0.5 °C and 60% relative air humidity. All the spectra were transformed into absorbance (lg (1/R)).

Spectral pretreatment

Before calibration, the spectral data were pretreated for optimal performance. Eight pretreatment strategies which included one or some combination of Savitzky-Golay smoothing, SNV, MSC, and first derivate (Norris gap) were compared with the raw spectra.

Sampling design

Samples were assigned to calibration and prediction sets using Kennard-Stone (KS) selection (Kennard and Stone 1969). The calibration models were established with the calibration set, and the prediction set was used to validate the predictive capabilities and analytical features of the calibration models.

PLS regression

PLS regression has been widely used as a calibration method to investigate the relationship between the spectral and the corresponding reference data. Before calibration of the PLS models, the data sets (spectral and reference data) were analyzed using 4-fold cross-validation to develop a full-spectra calibration model. The aim of the cross-validation was to find the optimum number of latent variables (LV) for PLS. The root-mean-square error of cross-validation (RMSECV) served as a measure to adjust the parameters, and the number of LV which provides the lowest RMSECV was selected as the best.

Model evaluation

The estimate of the calibration models was based on the following quality parameters:

$$ {R}^2=1-{\sum}_{i=1}^n{\left({Y}_{nirs}-{Y}_{ref}\right)}^2/{\sum}_{i=1}^n{\left({Y}_{ref}-\overline{Y_{ref}}\right)}^2 $$
$$ RMSE=\sqrt{\sum \limits_{i=1}^n{\left({Y}_{nirs}-{Y}_{ref}\right)}^2/n} $$
$$ RPD={SD}_{Y_{ref}}/ RMSEP $$

where n is the total number of samples, Ynirs is the predicted value by calibration models, Yref is the reference value by HPLC, and SD is the standard deviation.

The coefficient for determination of prediction (Rp2), the root mean square error of prediction (RMSEP), the coefficient for determination of calibration (Rc2), the root mean square error of cross-validation (RMSECV), and the residual predictive deviation (RPD) were used as criteria to evaluate model performance. An acceptable model should have high Rc2 and Rp2 values and low RMSECV and RMSEP values. Meanwhile, the model is considered robust if the RPD is higher than 2.5.


NIR spectroscopic data (268 samples × 1 501 variables) were exported in text format, organized in Microsoft Excel spreadsheets, and then transferred into MATLAB R2011a (Math Works, Natick, USA) for chemometric analysis. All the algorithms in spectral pretreatments, sampling design and regressions were implemented with MATLAB R2011a.


HPLC analysis

The regression equation, correlation coefficient (r2), limits of detection (LOD), limits of quantification (LOQ), and an average recovery of gossypol were illustrated in Table 1. The retention time of gossypol standard and gossypol extractions was 9.91 and 9.60 min, respectively (Fig. 1). Table 2 shows the stability for the peak area of gossypol determined by HPLC for 24 h. All the results indicated that the improved HPLC method could be used to detect gossypol content, and the cottonseed extract should be analyzed within 24 h.

Table 1 HPLC-VU results
Fig. 1
figure 1

Chromatograms of a gossypol standard and b gossypol extract in intact cottonseeds

Table 2 The stability of gossypol determined for HPLC during 24 h

NIR spectra analysis

Across the spectral range of 4 00010 000 cm− 1, the absorbance values are mainly associated with the combination and overtone bands of the C-H, N-H, O-H, and S-H bonds (Macho and Larrechi 2002), which were quite sensitive to the compositional variations in complex samples. Figure 2a shows the raw intact cottonseed spectra in the NIR spectral region. The spectra showed six broad absorption peaks around the 4 200, 4 700, 5 150, 5 580, 6 900, and 8 400 cm− 1, respectively. The small peak observed at 4 200 cm− 1 fell within the regions associated with the combination bands of C-H. At 5 150 and 6 900 cm− 1, these could be attributed to the combination and the first overtone bands of O-H, respectively, which were identified as water absorption. The gentle peaks at 5 580 and 8 400 cm− 1 overlapped with the second and first C-H overtone regions, respectively. It was worth mentioning that the peak at 4 700 cm− 1 was attributed to the first C-H combination bands of alkenes and aromatic hydrocarbons, which could be identified as the absorption of polyphenolic terpenes, including gossypol and its derivatives.

Fig. 2
figure 2

The NIR spectra of intact cottonseeds. a the raw spectra, b the spectra pretreated by MSC, c the spectra pretreated by SNV+ first derivate, and d the spectra of pretreated by SG smoothing+ SNV+ first derivate

The raw spectra were homogeneous, so the presence of noise could not be directly identified. Consistent baseline offsets and biases were present in the spectra, which are common features in the NIR spectra. Hence, eight pretreatment strategies were performed to optimize the raw spectra before the establishment of the calibration models. The pretreatment spectra of several types of representative strategies were shown in Fig. 2b, c, and d. To different degrees, all these pretreatments could reduce the physical change among samples due to scattering and remove both additive and multiplicative effects in the spectra. It was noted that ten variables were lost after SG smoothing. Hence, the 1 491 variables were used for calibration among the models using SG smoothing during the spectral pretreatments.

Kennard-Stone sampling design

The Kennard-Stone algorithm is an effective method for extracting a sample subset in the multidimensional space, which includes all the most diverse samples and enables the selection of a subset of representative samples. Therefore, it has been confirmed that the calibration set extracted using KS selection has a better predictive capability than a set randomly built or constructed by other data selection methods such as Kohonen self-organized mapping (Kohonen 1982) or D-optimal designs (de Aguiar et al. 1995). In this study, the total of 268 intact cottonseed samples were divided into calibration and prediction sets based on KS algorithm, with the former set consisting of 218 samples and the latter 50 samples. The statistical values of gossypol contents in all cottonseed samples for calibration and prediction set were demonstrated in Table 3, which indicated that the range of variation for gossypol content was broad enough to develop NIR calibration models.

Table 3 Statistical values of gossypol content for calibration and prediction set samples

PLS regression

The calibration models of gossypol content in intact cottonseeds based on PLS regression were established in the NIR spectral range of 4 00010 000 cm− 1, and the results were summarized in Table 4. The number of LV was selected with the aid of cross-validation using the first minimum RMSECV for all models. The RMSECV and RMSEP values for all the calibration models were between 0.050.07 and 0.040.06 for calibration and prediction sets, respectively. The values of Rp2 and Rc2 ranged from 0.82 to 0.93 and from 0.87 to 0.97, respectively. The RPD values ranged from 2.3 to 3.4.

Table 4 Performance comparison results for calibration models using different spectral pretreatment strategies


Since NIR spectra of intact cottonseeds were complex and overlapped, suitable spectral pretreatments should be used to optimize the NIR spectra and extract the effective information. In this work, the raw spectra were transformed using eight pretreatment strategies, including single pretreatment strategies (SG smoothing, SNV, MSC, and first derivate), two pretreatments strategies (SNV + first derivate and MSC + first derivate), and three pretreatments strategies (SG smoothing + SNV + first derivate and SG smoothing + MSC + first derivate). In the analyzing of the results obtained from single pretreatment strategies, the PLS model using eight latent variables based on the application of MSC produced better results with low values of RMSECV and RMSEP (0.06 and 0.05, respectively), and the RPD value was increased by 20.36% compared with that of the direct regression model based on raw spectra (Fig. 3). Figure 4b showed the correlation of the model using MSC, presented by plotting predicted and reference values for gossypol content in intact cottonseeds. The samples near the diagonal line indicated that their predicted values were more closed to reference ones and vice versa. In the aspect of two pretreatments strategies, the calibration model based on SNV + first derivate presented a better predictive ability than that on MSC + first derivate, with the Rc2 and Rp2 values of 0.962 and 0.887, respectively. The RPD value of that model was 3.0, increased by 28.14% compared with the model using raw spectra. From all the results of calibration models established, the best model was the one that pretreated using the strategy of SG + SNV + first derivate, and it had the highest Rc2 (0.97) and Rp2 (0.93), and the RPD (3.4) increased by 46.28% compared with that of the raw spectral model. Furthermore, RMSECV (0.05) and RMSEP (0.04) were the lowest among all the models. The correlation plots between the predicted and reference values were focused on the diagonal line (Fig. 4d). It was indicated that the model using SG + SNV + first derivate and PLS was accurate and robust enough to substitute the conventional gossypol analysis methods (HPLC) to measure gossypol in intact cottonseeds.

Fig. 3
figure 3

The residual predictive deviation (RPD) for PLS models based on different pretreatment strategies compared with the model using raw spectra

Fig. 4
figure 4

The correlation between predicted and reference values for models of intact cottonseeds. a the PLS model based on raw spectra, b the PLS model based on the pretreatment of MSC, c the PLS model based on the pretreatment of SNV+ first derivate, and d the PLS model based on the pretreatment of SG smoothing+ SNV+ first derivate

The NIR spectra of these intact seeds generally contained a mass of undesirable features, including noise, overlapping peaks, baseline effects, and some systematic behaviors, caused by the seed size, shell and some other physical factors. Hence, a suitable pretreatment strategy was required for the widespread application of NIR technology in crop seed analysis. In this work, it was indicated that an advisable pretreatment strategy before regression was important to refine the effective information from spectral data and eliminate spectral deviation to calibrate an accurate and robust NIR model.

The calibration models reported here confirmed the feasibility of the use of NIR technology for rapid and nondestructive determination of gossypol, an important parameter to cottonseed products, in intact cottonseeds for the first time. The high RPD values (3.4) suggested that this technology could be an effective method for the measurement of gossypol in intact cottonseeds. The optimal model could substitute conventional analysis methods for gossypol, including UV spectrophotometry and HPLC. Because of the potential of high sample throughput and low costs, as well as a significant reduction in toxic chemicals, the application of NIR method could be encouraged and popularized to other similar agricultural products.


The calibration and validation statistics obtained in the current work showed the potential of NIRS to predict microelement gossypol content in intact cottonseeds. The optimized model was that pretreated by Savitzky-Golay smoothing + standard normal variate + first derivate, with RMSECV, RMSEP, Rp2, and RPD of 0.05, 0.04, 0.92, and 3.4, respectively, which provided a method to determine gossypol content in intact cottonseeds feasibly.

Availability of data and materials

All relevant data are within this article.


Download references


We are grateful to Mrs. Yu Liu for her technical assistance.


The research work was funded by The National Key Technology R&D Program of China (2016YFD0101404), China Agriculture Research System (CARS-18-25), and Jiangsu Collaborative Innovation Center for Modern Crop Production.

Author information

Authors and Affiliations



Li C (Cheng) and Zhu SJ designed the experiments and wrote the manuscript. Li C (Cheng), Zhao TL and Su BS analyzed the data, Li C (Cheng), Su BS, Li C (Cong) participated in the experiment. Chen JH assisted in editing the article. Zhu SJ and Chen JH conducted and supervised the experiments. The author(s) read and approved the final manuscript.

Corresponding author

Correspondence to ZHU Shuijin.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

All co-authors have consent for submission of manuscript.

Competing interests

The authors declare that they have no competing interests.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

LI, C., SU, B., ZHAO, T. et al. Feasibility study on the use of near-infrared spectroscopy for rapid and nondestructive determination of gossypol content in intact cottonseeds. J Cotton Res 4, 13 (2021).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: