Samples and preparation
A total of 268 samples of cottonseeds were collected from different growing areas, including Hangzhou (Zhejiang, China), Xiaoshan (Zhejiang, China), Sanmen (Zhejiang, China), Sanya (Hainan, China), Wuhu (Anhui, China), and Yancheng (Jiangsu, China), in 2012, 2013, and 2014. The cottonseed samples were delinted and dried at 30 °C to constant weight. After spectral acquisition by NIR spectroscopy, the intact cottonseed samples were dehulled and then ground to cottonseed kernel powder for HPLC analysis. The preparations were implemented in the same experimental condition in order to reduce the influence of other physical factors.
Gossypol extraction
A sample of 0.1 g of cottonseed kernel powder was suspended in 5 mL acetone and sonicated in an ultrasonic bath for 45 min. Then, the suspension was filtered through quantitative filter paper followed by filtration with a 0.45 μm syringe filter (Agela, Newark, USA). The sediment was washed three times by acetone. After this procedure, the extract was adjusted to 25 mL using acetone.
HPLC analysis
HPLC analysis was performed on an Agilent 1100 HPLC system (Agilent, Santa Clara, USA), equipped with an auto-sampler and UV detection. A C18 column (250 mm × 4.6 mm, 5 μm, Dikma, Richmond Hill, USA) was employed as the stationary phase. The mobile phase consisted of methanol/0.2% H3PO4 (80/20, V/V). The injection volume was 10 μL and the flow rate was 1.0 mL·min− 1. The UV detector was set at 238 nm and the column temperature was 25 °C. Each sample was measured three times. The limit of detection (LOD) was obtained at a signal-to-noise (S/N) ratio of three and the limit of quantification (LOQ) at an S/N ratio. To detect the stability of gossypol at room temperature, three samples were randomly employed to determine the changes of the peak area within 36 h. HPLC-grade gossypol was purchased from Sigma (Sigma-Aldrich, St. Louis, USA). Methanol (HPLC grade) was procured from Tianjin Chemical Reagent Company (Tianjin, China). Double deionized water was prepared using Milli-Q-water purification system (Millipore, Molsheim, France).
NIR spectra acquisition
The NIR spectra of intact cottonseed samples were scanned with a Büchi Flex-N500 NIR spectrometer (Büchi, Flawil, Switzerland), equipped with a solid sample module as follows. The NIR spectra were collected across the range of 4 000∼10 000 cm− 1, and were recorded with a spectral resolution of 4 cm− 1. Samples were measured three times on a rotating cylinder device at 25 ± 0.5 °C and 60% relative air humidity. All the spectra were transformed into absorbance (lg (1/R)).
Spectral pretreatment
Before calibration, the spectral data were pretreated for optimal performance. Eight pretreatment strategies which included one or some combination of Savitzky-Golay smoothing, SNV, MSC, and first derivate (Norris gap) were compared with the raw spectra.
Sampling design
Samples were assigned to calibration and prediction sets using Kennard-Stone (KS) selection (Kennard and Stone 1969). The calibration models were established with the calibration set, and the prediction set was used to validate the predictive capabilities and analytical features of the calibration models.
PLS regression
PLS regression has been widely used as a calibration method to investigate the relationship between the spectral and the corresponding reference data. Before calibration of the PLS models, the data sets (spectral and reference data) were analyzed using 4-fold cross-validation to develop a full-spectra calibration model. The aim of the cross-validation was to find the optimum number of latent variables (LV) for PLS. The root-mean-square error of cross-validation (RMSECV) served as a measure to adjust the parameters, and the number of LV which provides the lowest RMSECV was selected as the best.
Model evaluation
The estimate of the calibration models was based on the following quality parameters:
$$ {R}^2=1-{\sum}_{i=1}^n{\left({Y}_{nirs}-{Y}_{ref}\right)}^2/{\sum}_{i=1}^n{\left({Y}_{ref}-\overline{Y_{ref}}\right)}^2 $$
(1)
$$ RMSE=\sqrt{\sum \limits_{i=1}^n{\left({Y}_{nirs}-{Y}_{ref}\right)}^2/n} $$
(2)
$$ RPD={SD}_{Y_{ref}}/ RMSEP $$
(3)
where n is the total number of samples, Ynirs is the predicted value by calibration models, Yref is the reference value by HPLC, and SD is the standard deviation.
The coefficient for determination of prediction (Rp2), the root mean square error of prediction (RMSEP), the coefficient for determination of calibration (Rc2), the root mean square error of cross-validation (RMSECV), and the residual predictive deviation (RPD) were used as criteria to evaluate model performance. An acceptable model should have high Rc2 and Rp2 values and low RMSECV and RMSEP values. Meanwhile, the model is considered robust if the RPD is higher than 2.5.
Software
NIR spectroscopic data (268 samples × 1 501 variables) were exported in text format, organized in Microsoft Excel spreadsheets, and then transferred into MATLAB R2011a (Math Works, Natick, USA) for chemometric analysis. All the algorithms in spectral pretreatments, sampling design and regressions were implemented with MATLAB R2011a.