1. The Principles and Characteristics of NIRS
Near infrared spectroscopy (NIRS), with a wavelength range between 780 and 2500 nm, can be divided into short-wave NIR (with a range of 780–1100 nm) and long-wave NIR (with a range of 1100–2500 nm)
[1], and is sometimes used together with a range of 350–780 nm visible range light to form a Vis-NIR spectrum for relevant detection. The state, composition, and structure of the molecule can be obtained by analyzing the primary overtones and oscillations between the hydrogen-containing groups, such as C-H, N-H, O-H, etc., by NIRS
[2]. The common near-infrared spectrometers consist of a light source, a beam splitter system (wavelength selector), a sample detector, and an optical detector, and some are equipped with a data processing/analysis system for simplicity. The use of these parts should be chosen according to their use. NIR spectroscopy has transmission, diffuse reflection, transmission and reflection detection methods, and the choice of different detection methods is also demand-dependent.
After data acquisition with the spectrometer, the general steps for spectral analysis include: (1) spectral data preprocessing
[3]; (2) feature wavelength selection
[4]; (3) model establishment and evaluation
[5]. The main analysis steps are shown in
Figure 1.
Figure 1. The general analyzing steps of NIRS data.
1.1. The Spectral Preprocessing Methods
The sample spectrum data collected by the spectrometer contains only not the chemical information of the sample itself, but also other irrelevant information and noise, such as electrical noise, sample background, stray light, etc.
[3]. Therefore, in the application of chemometric methods for spectral analysis, it is necessary to preprocess the original spectral data to eliminate the irrelevant information and noise in the data, which is a necessary step in the analysis.
Smoothing, derivative, multiple scattering correction (MSC), baseline correction, standard normal transformation (SNV), orthogonal signal correction (OSC), and combinations of these methods are common spectral preprocessing methods
[6][7][8].
Smoothing preprocessing is one of the most widely used methods for removing spectral noise. Moving average smoothing (MS) and Savitzk-Golay (SG) smoothing are commonly used smoothing methods. Derivative preprocessing is to eliminate the baseline offset and drift, enhance the spectral band features, and overcome the spectral band overlap
[9]. The direct difference method and SG derivative method are the commonly used derivative preprocessing methods
[10]. Baseline correction pretreatment successfully eliminates baseline drift and tilt caused by the instrument’s backdrop and the uneven surface of the sample by artificially pulling the baseline of the absorbance spectrum back to 0 baseline
[11]. Multiple scattering correction (MSC) preprocessing is mainly used to eliminate the effect of scattering on the spectrum and effectively enhance the spectral information related to the content of sample components; the spectral errors due to factors such as optical path changes or sample dilution can be eliminated
[2][12]. Standard normal variate transformation (SNV) is the processing of spectral data with a mean value of 0 and a standard deviation of 1
[13]. Orthogonal signal correction (OSC) is a spectral preprocessing algorithm based on the involvement of physical and chemical values of samples
[14][15]. In order to improve the robustness and prediction ability of the model, the information unrelated to the physical and chemical values of spectral data is removed by orthogonal projection and then analyzed by corresponding modeling methods.
In NIRS preprocessing analysis, MSC and SNV are two well-known methods for reducing spectral distortion due to dispersion, and they have been proven to be effective in correcting the problems of inhomogeneous particle distribution and refractive index variation in food applications
[16][17]. Although these preprocessing methods were aimed to reduce unmodeled variability in the spectra data in order to improve the features sought in the spectra, which are usually linearly related to the phenomenon of interest. However, if incorrect preprocessing techniques are used, the essential information may at risk of information removal
[8].
1.2. The Feature Wavelength Selection Methods
When all wavelength variables are used for modeling, it may be computationally intensive and time consuming, and sometimes the absorption of NIR spectra is not obvious and the overlap is serious, which contains redundant information, so it is normal to eliminate the irrelevant information and filter out the independent variables with high correlation when modeling. When the useless variable is introduced into the model, it will affect model stability and prediction precision. Therefore, it is necessary to extract the feature wavelength variables from the full spectrum before modeling. At present, the commonly used methods for selecting the characteristic wavelengths
[4][18][19] include principal component analysis (PCA), competitive adaptive reweighting (CARS), the genetic algorithm (GA), the successive projection algorithm (SPA), and uninformative variable elimination (UVE), etc.
PCA is a popular linear dimensionality reduction approach that is used to map high-dimensional data into a low-dimensional space using some type of linear projection. It is expected that the variance of the projected dimension is the largest, so that fewer data dimensions can be used and more original data points can be retained, which can reduce dimension and eliminate redundant information
[2]. CARS is a variable selection method proposed to simulate the “survival of the fittest” principle in Darwin’s evolution theory
[20]. The idea of GA is to optimize the PLSR model based on the RMSECV of selected variables by genetic iteration
[21][22]. SPA is a method to improve modeling speed and prediction accuracy by reducing the covariance between variables and obtaining the wavelength with the least redundant information
[4][23]. UVE is a wavelength selection algorithm based on the PLSR coefficients, which is used to eliminate the full-wavelength variables, the stability of which is less than the noise, thereby improving the predictive power of the model
[24]. Sometimes, one feature wavelength selected algorithm is used, and the modeling effect is not very effective, and is therefore often used in combination with other feature wavelength selection methods
[24].
1.3. Model Establishment and Evaluation
For NIR spectroscopy, a calibration model of the spectra is finally established in a linear and nonlinear way for qualitative or quantitative analysis after the pretreatment or feature wavelength selection.
With the rapid development of statistics, it is an inevitable trend to use mathematical analysis methods
[5] for more scientific classification and quantitative detection, which can be linear, non-linear, or supervised or unsupervised modes. The common qualitative and quantitative methods are k-nearest (KNN)
[25], linear discriminant analysis (LDA)
[9][26], partial least squares discriminant analysis (PLS-DA)
[27][28], extreme learning machine (ELM)
[25], Support vector machine (SVM)
[10][29][30], back propagation neural network (BPNN)
[31], partial least squares regression (PLSR)
[32][33], and radial basis function neural network (RBFNN)
[25], etc.
After the model is established, the stability and accuracy of the model is evaluated, and the high-quality correction model is selected. Indicators often employed include accuracy, correlation coefficient, standard deviation of calibration and prediction set samples, etc.
NIRS and chemometrics methods are a pair of twin technologies that have been developing in tandem with each other. In recent years, deep learning algorithms, represented by convolutional neural networks (CNN), have been used for quantitative and qualitative modeling of NIR spectra
[34][35]. Compared with traditional machine learning methods, the convolutional neural network can extract the features embedded in the spectral data step by step through multiple convolution and pooling layers, and to a certain extent, the preprocessing of spectra and the selection of variables before modeling can be reduced.
Among the most popular deep learning-based models, the DeepSpectra model has outperformed all the other models
[36]. The combination of deep learning and spectral detection methods is a promising approach for the quality assessment of food and agricultural products, as well as for genetic modification detection
[36][37].
2. The Applications of NIRS for the Detection of Transgenic Agricultural Products and Foods
In the last few decades, NIRS has demonstrated its power in the detection of agricultural products and foods, and there are now a series of applications in meat detection
[38], agricultural materials and foods safety control
[39][40][41], and fruits and vegetables detection
[42][43]. Taking maize testing as an example, NIRS has been used in a range of applications in the identification of variety purity identification
[25][44][45], vigor
[46], internal components such as moisture and protein
[47][48][49], fungal toxins
[50][51], and frost damage
[52]. Today, there is also equipment that can be used for online monitoring of agricultural products and foods using handheld/portable NIR spectroscopy for industrial applications
[53][54].
With the in-depth study of spectroscopy technology, researchers have started to introduce NIRS technology into the identification of transgenic food and agricultural products, as shown in Table 1.
Table 1. Studies on the detection of transgenic agricultural products and foods using near-infrared spectroscopy.
This entry is adapted from the peer-reviewed paper 10.3390/pr11030651