Illustration representing the analogy between human olfaction and artificial olfaction.
Pattern recognition constitutes an important part in the development of an E-nose instrument capable of detection, identification, and quantification of different complex volatile compounds. These are typically classified as statistical and intelligent pattern recognition methods
[10]. The most widely applied pattern recognition techniques in E-nose applications include linear discriminant analysis (LDA) and discriminant function analysis (DFA), principal component analysis (PCA), multinomial logistic regression (MultiLR), partial least squares discriminant analysis (PLS-DA), partial least squares regression (PLSR), hierarchical cluster analysis (HCA), K-nearest neighbor (KNN), artificial neural network (ANN), convolutional neural network (CNN), decision trees (DT), random forest (RF), and support vector machine (SVM)
[10][11]. PCA and HCA are unsupervised learning algorithms, while LDA and DFA, MultiLR, PLS-DA, PLSR, KNN, ANN, CNN, DT, RF, and SVM are supervised learning algorithms
[12]. These algorithms can be categorized into statistical or intelligent pattern recognition methods based on linear or nonlinear approaches
[9][10][13].
3.1. Statistical Pattern Recognition Methods
3.1.1. Linear Discriminant Analysis (LDA)
LDA, a dimensionality reduction technique, is a widely applied recognition method for E-nose devices. This method identifies a linear combination of features that characterize or distinguish between two or more classes of odors
[14]. Suppose there are two classes of odors in samples. LDA creates one hyperplane and projects the data in such a way that the separation of classes is maximized. The hyperplane is drawn based on minimizing the distance within the same class and maximizing the distance between the classes
[15]. The purpose of LDA is to reduce the original dataset into a lower-dimensional space with high sample discrimination, thus reducing the risk of overfitting and computing costs
[16]. The requirements of LDA include continuous quantities of the input independent variables for given observations
[14].
3.1.2. Principal Component Analysis (PCA)
PCA is a statistical method belonging to the factorial analysis group. PCA aims to use a small number of factors to represent the variance in a dataset
[17]. Using an orthogonal transformation, it converts a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables. These variables are known as principal components (PCs)
[14]. Iteratively calculated PCs retain as much variance as possible from the original data, such that PC1 accounts for more data variance than PC2, and PC2 accounts for more data variance than PC3, and so on. Because of this, a few PCs account for the variability in a large number of original data sets. For PCA analysis, only PCs having eigen values greater than 1 are considered significant based on the Kaiser criterion. In contrast, Bartlett’s test of sphericity indicates the suitability of the raw data for performing PCA
[17].
3.1.3. Multinomial Logistic Regression (MultiLR)
MultiLR is an extension of the logistic (binary) regression model. This method is used when the dependent variable of a study has more than two categorical levels of outcome variable. MultiLR, like binary logistic regression, employs maximum likelihood estimation to determine the probability of categorical membership
[18]. MultiLR is a classification approach with a discrete random variable set of 1,2,3, …, K, where K is the number of categories. The independent variable is either 0 or 1. Several studies have employed the application of MultiLR to create classification models with good classification results
[19]. MultiLR has gained much popularity because it does not require assumptions of normality, linearity, and homogeneity of variance of independent variables. Hence, it is a more commonly used method of analysis than discriminant function analysis since it does not require such assumptions
[20].
3.1.4. Partial Least Squares Discriminant Analysis (PLS-DA)
The PLS algorithm was earlier used for regression analysis and later developed into the PLS-DA classification approach (PLS-DA). PLS-DA has been used in practice for both predictive and descriptive modelling, as well as the selection of discriminative variables
[21]. PLS-DA is a dimensionality reduction technique that is referred to as a supervised version of PCA. It can be used for feature selection and classification as well. It seeks to find linear transformations that transform data to a lower dimensional space with the least amount of error via optimizing separation between samples of different groups
[22]. Furthermore, PLS-DA does not require the data to fit a specific distribution, making it more flexible compared to other discriminant methods such as Fisher’s LDA. Several researchers have reported a wide range of applications of PLS-DA modelling in food analysis and metabolomics
[21].
3.1.5. Partial Least Squares Regression (PLSR)
The PLSR technique combines and generalizes the features of PCA and MLR (multiple linear regression). Its aim is to predict a set of dependent variables from a set of independent variables. This prediction is accomplished by extracting a set of orthogonal factors (known as latent variables) from the predictors having the highest predictive ability. This technique is especially useful when a dependent variable set needs to be predicted from a large independent variable set
[23]. The PLSR model finds the relationship between two data matrices, X and Y. In addition, it goes beyond the traditional regression by modelling the structure of X and Y. PLSR can analyze data in both X and Y with numerous, correlated, noisy, and even incomplete variables. The precision of the model parameters improves with the increase in the number of observations and relevant variables. As a result, PLSR enables more realistic investigation of complex issues and data analysis
[24].
3.1.6. Hierarchical Cluster Analysis (HCA)
HCA is a clustering algorithm that examines the organization of test samples within and among groups in the form of a hierarchy. HCA results are typically presented in the form of a dendrogram, which is a tree-like plot that depicts the organization of samples and their relationships. In HCA, there are two main approaches to resolving the grouping problem: agglomerative and divisive
[17]. It basically allows the classification of variables based on their similarities and differences, while taking previously assigned characteristics into account
[25]. Clustering is performed using the appropriate distance measure (Manhattan, Euclidean, or Mahalanobis distance) and linkage criteria (single and average, complete, or Ward’s linkage). HCA has been widely applied to assess the multivariate relationship between bioactive substances and the bioactivity of beverages and foods
[17].
3.2. Intelligent Pattern Recognition Methods
3.2.1. K-Nearest Neighbor (KNN)
KNN is one of the most popularly used algorithms in the food industry to solve classification problems. The “K” in KNN represents the number of the nearest neighbors being included in the majority voting process
[26]. KNN is classified by computing the distances between different feature values. The idea is that if the majority of the K similar samples in the feature space belong to a particular category, then the sample does as well belong to that category, where k represents an integer no greater than 20. Moreover, the selected neighbors in the KNN algorithm are all correctly classified objects. This algorithm only determines which category the samples to be classified belong to, depending on the category of the proximity of the samples in the decision-making of classification
[27]. In this algorithm, the Manhattan or Euclidean distance is usually used as the distance metric
[27][28].
3.2.2. Artificial Neural Network (ANN)
An ANN is a supervised model inspired by the networks of biological neurons and is commonly used in classification and regression problems. It comprises multiple layers of nodes consisting of an input layer, one or more hidden layers, and an output layer
[29]. The number of hidden layers is largely dependent upon the task to be achieved. The activation of hidden layers is determined by the input layer and the weights between the input and hidden layers. Similarly, the activation of the output layer is determined by the hidden layers and the weights between them
[30]. The functions of an ANN are determined by the neuron activation function, the structure of the neuron pattern, and the learning process
[9]. There are three types of learning methods in ANN: supervised, unsupervised, and reinforced learning. Recently, ANN modelling has shown the potential to model nonlinear complex food engineering processes that are difficult to solve using traditional approaches
[31]. Several types of ANNs have been used to classify E-nose data and food processing models. These include learning vector quantization (LVQ), Kohonen networks, multi-layer perceptron (MLP), feed-forward backpropagation neural network (FFBPNN), convolutional neural network (CNN), and long short-term memory (LSTM) network, recurrent neural networks (RNNs), generative adversarial network (GAN), restricted Boltzmann machine (RBM), and deep Boltzmann machine (DBM)
[9][31].
3.2.3. Convolutional Neural Network (CNN)
CNN is a type of deep neural network like ANN that is a purely supervised learning algorithm and is primarily applied for image recognition
[9][31]. In brief, the image data (such as RGB value and intensity) pass through a certain series of convolutional layers that include filters (core or neurons), pooling layers, and fully connected layers before generating the output. The filters apply convolutional operations to the input image data and extract high-level features such as edges from the input image. The pooling layers then reduce the size of the image using the two common pooling methods, namely average pooling and maximum pooling. Following the above process steps, the data are finally fed to the fully connected layers, i.e., ANN, to perform classification
[9]. In contrast to traditional feature-based pattern recognition methods, CNN performs feature extraction and selection automatically, hence the preprocessing of input data is not required. Recently, CNN combined with E-nose has been identified as a useful tool in food and beverage analysis, especially for the classification of liquors
[30].
3.2.4. Decision Trees (DT) and Random Forest (RF)
A DT builds regression or classification models in the form of a tree-like architecture. DT organizes the dataset progressively into smaller homogeneous subsets (sub-populations), while also generating an associated tree graph. The internal nodes represent the dataset features, branches represent decision rules, and leaf nodes represent the classification outcome
[27][29]. The most common learning algorithms in this classification are regression and classification trees, the iterative dichotomizer, and the chi-square automatic interaction detector
[29].
An RF is a combination of tree (decision) predictors, widely used as a predictor and classifier for E-nose analysis. The value of a random vector determines a single tree predictor individually as well as for the other trees with the same distribution
[16][27]. This algorithm basically employs the rules to binary split data. The main rules used to binary split data in classification problems are the towing rule, Gini index, and deviance, among which the Gini index is the most commonly used
[32]. Aside from high predictive performance, RF analysis may indicate feature importance, revealing the contribution of each feature to predictors and thus allowing a quantifiable comparison of different structural features
[33].
3.2.5. Support Vector Machine (SVM)
A SVM is a supervised learning algorithm that can be extensively employed for statistical regression and classification analysis
[34]. It is based on a method for finding a particular type of linear model known as the maximum-margin hyperplane. In order to visualize the maximum-margin hyperplane, consider a two-class dataset whose classes are linearly separable, which means that a hyperplane in the input space correctly classifies all training instances
[35]. After being transformed by a nonlinear function, i.e., kernel function, the algorithm process enables SVM to fit the n-dimensional feature space into a K-dimensional hyperplane (K >
n)
[16]. SVM algorithms are frequently used in E-Nose related applications
[30]. Commonly used SVM algorithms include successive projection algorithm-support vector machine, support vector regression, and least squares support vector machine
[29].
4. Applications of E-Nose in Tea Quality Evaluation
In the tea industry, tea quality management is considered a critical responsibility. As a result, tea quality and nutrition throughout tea processing must be analyzed so as to maintain the top quality of marketed tea products. However, due to the high cost of tea items, adulteration is common, resulting in a flood of tea products bearing false brand names in the market and unscrupulous vendors profiting from the awful fakes. As a result, distinguishing between genuine and counterfeit products is difficult [36]. According to numerous reports, E-nose is a potential technology for monitoring the authenticity of food products [37]. Table 3 summarizes a set of E-noses utilized in combination with various pattern recognition algorithms to assess the quality of varied tea types from the last 10 years. E-nose devices were employed to categorize and differentiate different tea types according to their origins, quality grades, adulteration degree based on the mix ratios, and drying processes, and to monitor the smell variation of fermentation (Table 3).
Table 3. Description of various E-nose configurations and pattern recognition methods used for tea quality evaluation.
Tea polyphenols, amino acids, and caffeine are responsible for forming the astringency and bitterness of tea. Even though many methods have been developed to evaluate tea’s taste, this task has always been challenging. In this regard, a rapid and feasible method was established using E-nose and mathematical modelling to identify the bitterness and astringent taste of green tea samples. The findings revealed that the BPNN model was more reliable than the PLSR and MLR models in examining the bitterness and astringency of tea infusions [41].
Processing technology is crucial in providing the distinctive flavor of black tea, including withering, rolling, fermentation, and drying processes. Yang et al. [48] employed E-nose to examine the volatile profile of Congou black tea, as well as the changes in the aroma features across the different variable-temperature final firing processes. The applied PLS-DA clearly differentiated the tea samples by different drying conditions.