Applications of E-Nose in Tea Quality Evaluation

Applications of E-Nose in Tea Quality Evaluation: Comparison

Please note this is a comparison between Version 1 by Ho-Hsien Chen and Version 2 by Lindsay Dong.

The advancement in sensor technology has replaced the human olfaction system with an artificial olfaction system, i.e., electronic noses (E-noses) for quality control of teas to differentiate the distinct aromas. An E-nose system’s sensor array consists of some non-specific sensors, and an odor stimulus generates a fingerprint from this array. Fingerprints or patterns from known odors are used to train a pattern recognition model such that unknown odors can be classified and identified subsequently [9]. Recently, the E-nose has been regarded as a powerful tool for tea quality monitoring. For instance, wide applications in tea research include tea classification, tea fermentation methods, tea components, tea grade quality, and tea storage [3,9].

aroma
electronic nose
gas sensors
intelligent pattern recognition
tea quality

1. Introduction

Usually, human sensory analysis is used to assess the quality of tea based on sensory descriptors, namely morphological characteristics, tastes, aromas, texture, and colors ^[1][2][7,8]. The analysis requires professional panel evaluators to have their own perceptions about various tea sensory attributes, which is difficult for consumers to understand. Besides, the main problems of sensory analysis are the need for a large group of educated and trained people (not always feasible) ^[3][9], long and constant training, harmonization of vocabulary, and special technical conditions (such as room and lighting). In recent years, numerous analytical techniques have been proposed for assessing tea quality: for example, ultraperformance liquid chromatography (UPLC), high-performance liquid chromatography (HPLC), gas chromatography-mass spectrometry (GC-MS), capillary electrophoresis (CE), and plasma atomic emission spectrometry ^[1][2][7,8]. These methods, however, have disadvantages in that they impose high operation costs, require highly skilled analysts and time-consuming techniques, and cannot be applied for real-time monitoring of tea quality ^[2][4][5][3,8,10]. Therefore, the higher expectations for quality control of products have increased the requirements for rapid, reliable, and cost-effective analysis.

With the rapid advancement of multi-sensor and electronic technologies, accurate results in fast food analysis have become possible. Electronic eye (E-eye), electronic nose (E-nose), and electronic tongue (E-tongue) systems are composed of color, gas, and liquid sensors that resemble human vision, olfactory, and gustatory systems. The sample detection is fast, with no requirement for sample pretreatment ^[6][2]. An E-nose system has been developed to identify and distinguish different odors ^[7][11]. An E-nose system’s sensor array consists of some non-specific sensors, and an odor stimulus generates a fingerprint from this array. Fingerprints or patterns from known odors are used to train a pattern recognition model such that unknown odors can be classified and identified subsequently ^[3][9]. Recently, the E-nose has been regarded as a powerful tool for tea quality monitoring. For instance, wide applications in tea research include tea classification, tea fermentation methods, tea components, tea grade quality, and tea storage ^[3][4][3,9].

2. E-Nose Instrumentation

The E-nose device is designed in such a manner as to identify and distinguish between a variety of complex odors. This non-destructive device is composed of sensor arrays that react to the vapors and gases the sample generates. Typically, the sensor array comprises non-specific sensors that have been sensitized to various chemical substances; thus, each element measures a distinct property of the chemical perceived ^[8][12]. When the odors or volatile molecules react with the sensor array, subsequent changes in the electrical properties, mainly conductivity, occur. As a consequence, pattern recognition algorithms are used to characterize those detected changes to perform classification or discrimination of samples ^[9][13]. As represented in Figure 1, the E-nose device consists of three main parts: (1) sample handling system, (2) detection system, and (3) data processing system and pattern recognition algorithms. A comparison of the basic analogies between human olfaction (biological olfaction) and an E-nose (artificial olfaction) is represented in Figure 2.

Figure 1.

A schematic representation of the E-nose system.

Figure 2. Illustration representing the analogy between human olfaction and artificial olfaction.

3. Pattern Recognition Algorithms for E-Nose

Pattern recognition constitutes an important part in the development of an E-nose instrument capable of detection, identification, and quantification of different complex volatile compounds. These are typically classified as statistical and intelligent pattern recognition methods ^[10]. The most widely applied pattern recognition techniques in E-nose applications include linear discriminant analysis (LDA) and discriminant function analysis (DFA), principal component analysis (PCA), multinomial logistic regression (MultiLR), partial least squares discriminant analysis (PLS-DA), partial least squares regression (PLSR), hierarchical cluster analysis (HCA), K-nearest neighbor (KNN), artificial neural network (ANN), convolutional neural network (CNN), decision trees (DT), random forest (RF), and support vector machine (SVM) ^[10][11]. PCA and HCA are unsupervised learning algorithms, while LDA and DFA, MultiLR, PLS-DA, PLSR, KNN, ANN, CNN, DT, RF, and SVM are supervised learning algorithms ^[12]. These algorithms can be categorized into statistical or intelligent pattern recognition methods based on linear or nonlinear approaches ^[9][10][13].

Illustration representing the analogy between human olfaction and artificial olfaction.

3. Pattern Recognition Algorithms for E-Nose

Pattern recognition constitutes an important part in the development of an E-nose instrument capable of detection, identification, and quantification of different complex volatile compounds. These are typically classified as statistical and intelligent pattern recognition methods [14]. The most widely applied pattern recognition techniques in E-nose applications include linear discriminant analysis (LDA) and discriminant function analysis (DFA), principal component analysis (PCA), multinomial logistic regression (MultiLR), partial least squares discriminant analysis (PLS-DA), partial least squares regression (PLSR), hierarchical cluster analysis (HCA), K-nearest neighbor (KNN), artificial neural network (ANN), convolutional neural network (CNN), decision trees (DT), random forest (RF), and support vector machine (SVM) [14,31]. PCA and HCA are unsupervised learning algorithms, while LDA and DFA, MultiLR, PLS-DA, PLSR, KNN, ANN, CNN, DT, RF, and SVM are supervised learning algorithms [32]. These algorithms can be categorized into statistical or intelligent pattern recognition methods based on linear or nonlinear approaches [13,14,15].

3.1. Statistical Pattern Recognition Methods

3.1.1. Linear Discriminant Analysis (LDA)

LDA, a dimensionality reduction technique, is a widely applied recognition method for E-nose devices. This method identifies a linear combination of features that characterize or distinguish between two or more classes of odors ^[14]. Suppose there are two classes of odors in samples. LDA creates one hyperplane and projects the data in such a way that the separation of classes is maximized. The hyperplane is drawn based on minimizing the distance within the same class and maximizing the distance between the classes ^[15]. The purpose of LDA is to reduce the original dataset into a lower-dimensional space with high sample discrimination, thus reducing the risk of overfitting and computing costs ^[16]. The requirements of LDA include continuous quantities of the input independent variables for given observations ^[14].

LDA, a dimensionality reduction technique, is a widely applied recognition method for E-nose devices. This method identifies a linear combination of features that characterize or distinguish between two or more classes of odors [33]. Suppose there are two classes of odors in samples. LDA creates one hyperplane and projects the data in such a way that the separation of classes is maximized. The hyperplane is drawn based on minimizing the distance within the same class and maximizing the distance between the classes [34]. The purpose of LDA is to reduce the original dataset into a lower-dimensional space with high sample discrimination, thus reducing the risk of overfitting and computing costs [35]. The requirements of LDA include continuous quantities of the input independent variables for given observations [33].

3.1.2. Principal Component Analysis (PCA)

PCA is a statistical method belonging to the factorial analysis group. PCA aims to use a small number of factors to represent the variance in a dataset ^[17]. Using an orthogonal transformation, it converts a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables. These variables are known as principal components (PCs) ^[14]. Iteratively calculated PCs retain as much variance as possible from the original data, such that PC1 accounts for more data variance than PC2, and PC2 accounts for more data variance than PC3, and so on. Because of this, a few PCs account for the variability in a large number of original data sets. For PCA analysis, only PCs having eigen values greater than 1 are considered significant based on the Kaiser criterion. In contrast, Bartlett’s test of sphericity indicates the suitability of the raw data for performing PCA ^[17].

PCA is a statistical method belonging to the factorial analysis group. PCA aims to use a small number of factors to represent the variance in a dataset [36]. Using an orthogonal transformation, it converts a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables. These variables are known as principal components (PCs) [33]. Iteratively calculated PCs retain as much variance as possible from the original data, such that PC1 accounts for more data variance than PC2, and PC2 accounts for more data variance than PC3, and so on. Because of this, a few PCs account for the variability in a large number of original data sets. For PCA analysis, only PCs having eigen values greater than 1 are considered significant based on the Kaiser criterion. In contrast, Bartlett’s test of sphericity indicates the suitability of the raw data for performing PCA [36].

3.1.3. Multinomial Logistic Regression (MultiLR)

MultiLR is an extension of the logistic (binary) regression model. This method is used when the dependent variable of a study has more than two categorical levels of outcome variable. MultiLR, like binary logistic regression, employs maximum likelihood estimation to determine the probability of categorical membership ^[18]. MultiLR is a classification approach with a discrete random variable set of 1,2,3, …, K, where K is the number of categories. The independent variable is either 0 or 1. Several studies have employed the application of MultiLR to create classification models with good classification results ^[19]. MultiLR has gained much popularity because it does not require assumptions of normality, linearity, and homogeneity of variance of independent variables. Hence, it is a more commonly used method of analysis than discriminant function analysis since it does not require such assumptions ^[20].

MultiLR is an extension of the logistic (binary) regression model. This method is used when the dependent variable of a study has more than two categorical levels of outcome variable. MultiLR, like binary logistic regression, employs maximum likelihood estimation to determine the probability of categorical membership [37]. MultiLR is a classification approach with a discrete random variable set of 1,2,3, …, K, where K is the number of categories. The independent variable is either 0 or 1. Several studies have employed the application of MultiLR to create classification models with good classification results [38]. MultiLR has gained much popularity because it does not require assumptions of normality, linearity, and homogeneity of variance of independent variables. Hence, it is a more commonly used method of analysis than discriminant function analysis since it does not require such assumptions [39].

3.1.4. Partial Least Squares Discriminant Analysis (PLS-DA)

The PLS algorithm was earlier used for regression analysis and later developed into the PLS-DA classification approach (PLS-DA). PLS-DA has been used in practice for both predictive and descriptive modelling, as well as the selection of discriminative variables ^[21]. PLS-DA is a dimensionality reduction technique that is referred to as a supervised version of PCA. It can be used for feature selection and classification as well. It seeks to find linear transformations that transform data to a lower dimensional space with the least amount of error via optimizing separation between samples of different groups ^[22]. Furthermore, PLS-DA does not require the data to fit a specific distribution, making it more flexible compared to other discriminant methods such as Fisher’s LDA. Several researchers have reported a wide range of applications of PLS-DA modelling in food analysis and metabolomics ^[21].

The PLS algorithm was earlier used for regression analysis and later developed into the PLS-DA classification approach (PLS-DA). PLS-DA has been used in practice for both predictive and descriptive modelling, as well as the selection of discriminative variables [40]. PLS-DA is a dimensionality reduction technique that is referred to as a supervised version of PCA. It can be used for feature selection and classification as well. It seeks to find linear transformations that transform data to a lower dimensional space with the least amount of error via optimizing separation between samples of different groups [41]. Furthermore, PLS-DA does not require the data to fit a specific distribution, making it more flexible compared to other discriminant methods such as Fisher’s LDA. Several researchers have reported a wide range of applications of PLS-DA modelling in food analysis and metabolomics [40].

3.1.5. Partial Least Squares Regression (PLSR)

The PLSR technique combines and generalizes the features of PCA and MLR (multiple linear regression). Its aim is to predict a set of dependent variables from a set of independent variables. This prediction is accomplished by extracting a set of orthogonal factors (known as latent variables) from the predictors having the highest predictive ability. This technique is especially useful when a dependent variable set needs to be predicted from a large independent variable set ^[23]. The PLSR model finds the relationship between two data matrices, X and Y. In addition, it goes beyond the traditional regression by modelling the structure of X and Y. PLSR can analyze data in both X and Y with numerous, correlated, noisy, and even incomplete variables. The precision of the model parameters improves with the increase in the number of observations and relevant variables. As a result, PLSR enables more realistic investigation of complex issues and data analysis ^[24].

The PLSR technique combines and generalizes the features of PCA and MLR (multiple linear regression). Its aim is to predict a set of dependent variables from a set of independent variables. This prediction is accomplished by extracting a set of orthogonal factors (known as latent variables) from the predictors having the highest predictive ability. This technique is especially useful when a dependent variable set needs to be predicted from a large independent variable set [42]. The PLSR model finds the relationship between two data matrices, X and Y. In addition, it goes beyond the traditional regression by modelling the structure of X and Y. PLSR can analyze data in both X and Y with numerous, correlated, noisy, and even incomplete variables. The precision of the model parameters improves with the increase in the number of observations and relevant variables. As a result, PLSR enables more realistic investigation of complex issues and data analysis [43].

3.1.6. Hierarchical Cluster Analysis (HCA)

HCA is a clustering algorithm that examines the organization of test samples within and among groups in the form of a hierarchy. HCA results are typically presented in the form of a dendrogram, which is a tree-like plot that depicts the organization of samples and their relationships. In HCA, there are two main approaches to resolving the grouping problem: agglomerative and divisive ^[17]. It basically allows the classification of variables based on their similarities and differences, while taking previously assigned characteristics into account ^[25]. Clustering is performed using the appropriate distance measure (Manhattan, Euclidean, or Mahalanobis distance) and linkage criteria (single and average, complete, or Ward’s linkage). HCA has been widely applied to assess the multivariate relationship between bioactive substances and the bioactivity of beverages and foods ^[17].

HCA is a clustering algorithm that examines the organization of test samples within and among groups in the form of a hierarchy. HCA results are typically presented in the form of a dendrogram, which is a tree-like plot that depicts the organization of samples and their relationships. In HCA, there are two main approaches to resolving the grouping problem: agglomerative and divisive [36]. It basically allows the classification of variables based on their similarities and differences, while taking previously assigned characteristics into account [44]. Clustering is performed using the appropriate distance measure (Manhattan, Euclidean, or Mahalanobis distance) and linkage criteria (single and average, complete, or Ward’s linkage). HCA has been widely applied to assess the multivariate relationship between bioactive substances and the bioactivity of beverages and foods [36].

3.2. Intelligent Pattern Recognition Methods

3.2.1. K-Nearest Neighbor (KNN)

KNN is one of the most popularly used algorithms in the food industry to solve classification problems. The “K” in KNN represents the number of the nearest neighbors being included in the majority voting process ^[26]. KNN is classified by computing the distances between different feature values. The idea is that if the majority of the K similar samples in the feature space belong to a particular category, then the sample does as well belong to that category, where k represents an integer no greater than 20. Moreover, the selected neighbors in the KNN algorithm are all correctly classified objects. This algorithm only determines which category the samples to be classified belong to, depending on the category of the proximity of the samples in the decision-making of classification ^[27]. In this algorithm, the Manhattan or Euclidean distance is usually used as the distance metric ^[27][28].

KNN is one of the most popularly used algorithms in the food industry to solve classification problems. The “K” in KNN represents the number of the nearest neighbors being included in the majority voting process [45]. KNN is classified by computing the distances between different feature values. The idea is that if the majority of the K similar samples in the feature space belong to a particular category, then the sample does as well belong to that category, where k represents an integer no greater than 20. Moreover, the selected neighbors in the KNN algorithm are all correctly classified objects. This algorithm only determines which category the samples to be classified belong to, depending on the category of the proximity of the samples in the decision-making of classification [46]. In this algorithm, the Manhattan or Euclidean distance is usually used as the distance metric [46,47].

3.2.2. Artificial Neural Network (ANN)

An ANN is a supervised model inspired by the networks of biological neurons and is commonly used in classification and regression problems. It comprises multiple layers of nodes consisting of an input layer, one or more hidden layers, and an output layer ^[29]. The number of hidden layers is largely dependent upon the task to be achieved. The activation of hidden layers is determined by the input layer and the weights between the input and hidden layers. Similarly, the activation of the output layer is determined by the hidden layers and the weights between them ^[30]. The functions of an ANN are determined by the neuron activation function, the structure of the neuron pattern, and the learning process ^[9]. There are three types of learning methods in ANN: supervised, unsupervised, and reinforced learning. Recently, ANN modelling has shown the potential to model nonlinear complex food engineering processes that are difficult to solve using traditional approaches ^[31]. Several types of ANNs have been used to classify E-nose data and food processing models. These include learning vector quantization (LVQ), Kohonen networks, multi-layer perceptron (MLP), feed-forward backpropagation neural network (FFBPNN), convolutional neural network (CNN), and long short-term memory (LSTM) network, recurrent neural networks (RNNs), generative adversarial network (GAN), restricted Boltzmann machine (RBM), and deep Boltzmann machine (DBM) ^[9][31].

An ANN is a supervised model inspired by the networks of biological neurons and is commonly used in classification and regression problems. It comprises multiple layers of nodes consisting of an input layer, one or more hidden layers, and an output layer [48]. The number of hidden layers is largely dependent upon the task to be achieved. The activation of hidden layers is determined by the input layer and the weights between the input and hidden layers. Similarly, the activation of the output layer is determined by the hidden layers and the weights between them [23]. The functions of an ANN are determined by the neuron activation function, the structure of the neuron pattern, and the learning process [13]. There are three types of learning methods in ANN: supervised, unsupervised, and reinforced learning. Recently, ANN modelling has shown the potential to model nonlinear complex food engineering processes that are difficult to solve using traditional approaches [49]. Several types of ANNs have been used to classify E-nose data and food processing models. These include learning vector quantization (LVQ), Kohonen networks, multi-layer perceptron (MLP), feed-forward backpropagation neural network (FFBPNN), convolutional neural network (CNN), and long short-term memory (LSTM) network, recurrent neural networks (RNNs), generative adversarial network (GAN), restricted Boltzmann machine (RBM), and deep Boltzmann machine (DBM) [13,49].

3.2.3. Convolutional Neural Network (CNN)

CNN is a type of deep neural network like ANN that is a purely supervised learning algorithm and is primarily applied for image recognition ^[9][31]. In brief, the image data (such as RGB value and intensity) pass through a certain series of convolutional layers that include filters (core or neurons), pooling layers, and fully connected layers before generating the output. The filters apply convolutional operations to the input image data and extract high-level features such as edges from the input image. The pooling layers then reduce the size of the image using the two common pooling methods, namely average pooling and maximum pooling. Following the above process steps, the data are finally fed to the fully connected layers, i.e., ANN, to perform classification ^[9]. In contrast to traditional feature-based pattern recognition methods, CNN performs feature extraction and selection automatically, hence the preprocessing of input data is not required. Recently, CNN combined with E-nose has been identified as a useful tool in food and beverage analysis, especially for the classification of liquors ^[30].

CNN is a type of deep neural network like ANN that is a purely supervised learning algorithm and is primarily applied for image recognition [13,49]. In brief, the image data (such as RGB value and intensity) pass through a certain series of convolutional layers that include filters (core or neurons), pooling layers, and fully connected layers before generating the output. The filters apply convolutional operations to the input image data and extract high-level features such as edges from the input image. The pooling layers then reduce the size of the image using the two common pooling methods, namely average pooling and maximum pooling. Following the above process steps, the data are finally fed to the fully connected layers, i.e., ANN, to perform classification [13]. In contrast to traditional feature-based pattern recognition methods, CNN performs feature extraction and selection automatically, hence the preprocessing of input data is not required. Recently, CNN combined with E-nose has been identified as a useful tool in food and beverage analysis, especially for the classification of liquors [23].

3.2.4. Decision Trees (DT) and Random Forest (RF)

A DT builds regression or classification models in the form of a tree-like architecture. DT organizes the dataset progressively into smaller homogeneous subsets (sub-populations), while also generating an associated tree graph. The internal nodes represent the dataset features, branches represent decision rules, and leaf nodes represent the classification outcome ^[27][29]. The most common learning algorithms in this classification are regression and classification trees, the iterative dichotomizer, and the chi-square automatic interaction detector ^[29].

A DT builds regression or classification models in the form of a tree-like architecture. DT organizes the dataset progressively into smaller homogeneous subsets (sub-populations), while also generating an associated tree graph. The internal nodes represent the dataset features, branches represent decision rules, and leaf nodes represent the classification outcome [46,48]. The most common learning algorithms in this classification are regression and classification trees, the iterative dichotomizer, and the chi-square automatic interaction detector [48].

An RF is a combination of tree (decision) predictors, widely used as a predictor and classifier for E-nose analysis. The value of a random vector determines a single tree predictor individually as well as for the other trees with the same distribution ^[16][27]. This algorithm basically employs the rules to binary split data. The main rules used to binary split data in classification problems are the towing rule, Gini index, and deviance, among which the Gini index is the most commonly used ^[32]. Aside from high predictive performance, RF analysis may indicate feature importance, revealing the contribution of each feature to predictors and thus allowing a quantifiable comparison of different structural features ^[33].

An RF is a combination of tree (decision) predictors, widely used as a predictor and classifier for E-nose analysis. The value of a random vector determines a single tree predictor individually as well as for the other trees with the same distribution [35,46]. This algorithm basically employs the rules to binary split data. The main rules used to binary split data in classification problems are the towing rule, Gini index, and deviance, among which the Gini index is the most commonly used [50]. Aside from high predictive performance, RF analysis may indicate feature importance, revealing the contribution of each feature to predictors and thus allowing a quantifiable comparison of different structural features [51].

3.2.5. Support Vector Machine (SVM)

A SVM is a supervised learning algorithm that can be extensively employed for statistical regression and classification analysis ^[34]. It is based on a method for finding a particular type of linear model known as the maximum-margin hyperplane. In order to visualize the maximum-margin hyperplane, consider a two-class dataset whose classes are linearly separable, which means that a hyperplane in the input space correctly classifies all training instances ^[35]. After being transformed by a nonlinear function, i.e., kernel function, the algorithm process enables SVM to fit the n-dimensional feature space into a K-dimensional hyperplane (K >

A SVM is a supervised learning algorithm that can be extensively employed for statistical regression and classification analysis [52]. It is based on a method for finding a particular type of linear model known as the maximum-margin hyperplane. In order to visualize the maximum-margin hyperplane, consider a two-class dataset whose classes are linearly separable, which means that a hyperplane in the input space correctly classifies all training instances [53]. After being transformed by a nonlinear function, i.e., kernel function, the algorithm process enables SVM to fit the n-dimensional feature space into a K-dimensional hyperplane (K >

n) ^[16]. SVM algorithms are frequently used in E-Nose related applications ^[30]. Commonly used SVM algorithms include successive projection algorithm-support vector machine, support vector regression, and least squares support vector machine ^[29].

) [35]. SVM algorithms are frequently used in E-Nose related applications [23]. Commonly used SVM algorithms include successive projection algorithm-support vector machine, support vector regression, and least squares support vector machine [48].

4. Applications of E-Nose in Tea Quality Evaluation

In the tea industry, tea quality management is considered a critical responsibility. As a result, tea quality and nutrition throughout tea processing must be analyzed so as to maintain the top quality of marketed tea products. However, due to the high cost of tea items, adulteration is common, resulting in a flood of tea products bearing false brand names in the market and unscrupulous vendors profiting from the awful fakes. As a result, distinguishing between genuine and counterfeit products is difficult ^[36]. According to numerous reports, E-nose is a potential technology for monitoring the authenticity of food products ^[37].

In the tea industry, tea quality management is considered a critical responsibility. As a result, tea quality and nutrition throughout tea processing must be analyzed so as to maintain the top quality of marketed tea products. However, due to the high cost of tea items, adulteration is common, resulting in a flood of tea products bearing false brand names in the market and unscrupulous vendors profiting from the awful fakes. As a result, distinguishing between genuine and counterfeit products is difficult [54]. According to numerous reports, E-nose is a potential technology for monitoring the authenticity of food products [21].

Table 3

summarizes a set of E-noses utilized in combination with various pattern recognition algorithms to assess the quality of varied tea types from the last 10 years. E-nose devices were employed to categorize and differentiate different tea types according to their origins, quality grades, adulteration degree based on the mix ratios, and drying processes, and to monitor the smell variation of fermentation (

Table 3

Table 3. Description of various E-nose configurations and pattern recognition methods used for tea quality evaluation.

S. No.	Tea Variety	Purpose of Analysis	E-Nose Configuration	Pattern Recognition Methods	References
1	Chaoqing Green Tea	To differentiate green teas according to its quality	E-nose system (developed by Agricultural Product Processing and Storage Laboratory, Jiangsu University, Zhenjiang, China) with 8 TGS gas sensors (Figaro Co., Ltd., Osaka, Japan)	PCA, SVM, KNN, and ANN	^[3]	[9]
2	Longjing Tea	To detect tea aroma for tea quality identification	PEN3 (Airsense Analytics, Schwerin, Germany) with 10 MOS sensors	PCA, KNN, SVM and MLR	^[19]	[38]
3	Longjing Tea	To develop a multi-level fusion framework for enhancing tea quality prediction accuracy	Fox 4000 (Alpha M.O.S., Co., Toulouse, France) with 18 MOS sensors	K(LDA), KNN	^[2]	[8]
4	Xihu-Longjing Tea	To classify the grades of tea based on the feature fusion method	Fox 4000 (Alpha MOS Company, Toulouse, France) with 18 MOS sensors	K(PCA), K(LDA), KNN	^[38]	[55]
5	Chinese Chrysanthemum Tea	To differentiate the aroma profiles of teas from different geographical origins	GC Flash E-nose (Alpha M.O.S. Heracles, Toulouse, France)	PCA	^[39]	[5]
6	Pu-erh Tea	To perform classification of two types of teas based on the volatile components	Fox-3000 (Alpha MOS, Toulouse, France) with 12 MOS sensors	PCA	^[40]	[56]
7	Green and Dark Tea	To assess the quality of tea grades	PEN3 (Airsense Analytics GmbH, Schwerin, Germany) with 10 gas sensors	PCA, LDA	^[1]	[7]
8	Black Tea	To investigate in situ discrimination of the quality of tea samples	Lab-made E-nose with 8 MOS sensors (Figaro Engineering Inc., Osaka, Japan)	PCA, LDA, QDA, SVM-linear, SVM-radial	^[5]	[10]
9	Xinyang Maojian Tea	To evaluate the different tastes of tea samples	PEN3 (Win Muster Airsense Analytics Inc., Schwerin, Germany) with 10 MOS sensors	MLR, PLSR, BPNN	^[41]	[57]
10	Black Tea, Yellow Tea, and Green Tea	To evaluate polyphenols of cross-category teas	PEN3 (Win Muster Air-sense Analytics Inc., Schwerin, Germany) with 10 MOS sensors	RF, Grid-SVR, XGBoost	^[4]	[3]
11	Pu-erh Tea	To discriminate between the aroma components of teas from varying storage years	PEN3 (Airsense, Schwerin, Germany) with 10 MOS sensors	LDA, PCA	^[42]	[58]
12	Herbal Tea	To investigate bio-inspired flavor evaluation of teas from different types and brands	PEN3 (Win Muster Airsense Analytics Inc., Schwerin, Germany) with 10 MOS sensors	LDA, SVM, KNN, and PNN	^[43]	[18]
13	Pu’er Tea	To devise a rapid method for determining the type, blended as well as mixed ratios of tea	PEN 3 (Airsense Inc., Schwerin, Germany) with 10 MOS sensors	LDA, CNN, PLSR	^[44]	[59]
14	Green Tea	To evaluate the quality grades of different teas	PEN3 (Airsense Analytics GmbH, Schwerin, Germany) with 10 MOS sensors	PCA, LDA, RF, SVM, PLSR, KRR, SVR, MBPNN	^[45]	[60]
15	Jasmine Tea	To examine the differences in aroma characteristics in different tea grades	ISENSO (Shanghai Ongshen Intelligent Technology Co., Ltd., Shanghai, China) with 10 MOS sensors	PCA, HCA	^[46]	[61]
16	Xihu Longjing Tea	To detect teas from different geographical indications	PEN3 (Airsense Analytics GmbH, Schwerin, Germany) with 10 MOS sensors	PCA, SVM, RF, XGBoost, LightGBM, TrLightGBM, BPNN	^[47]	[62]
17	Congou Black Tea	To investigate the aroma characteristics of tea during the variable-temperature final firing	Heracles II ultra-fast gas phase E-nose (Alpha M.O.S., Toulouse, France)	PLS-DA	^[48]	[63]
18	Longjing Tea	To determine the different quality grades of green teas	PEN2 (Airsense Company, Schwerin, Germany) with 10 MOS sensors	PCA, DFA, PLSR	^[49]	[64]
19	Pu-erh Tea	To rapidly characterize the volatile compounds in tea	Heracles II gas phase E-nose (Alpha M.O.S., Toulouse, France)	OPLS-DA	^[50]	[65]
20	Longjing tea	To determine the tea quality of different grades	PEN3 (Airsense Corporation, Schwerin, Germany), with 10 MOS sensors	PCA, MDS, LDA, LR, SVM	^[51]	[1]
21	Mulberry Tea	To develop a rapid and non-destructive method for visualizing the volatile profiles of different leaf tea samples of various grades	Fox 4000 (Alpha M.O.S., Toulouse, France) with 18 MOS sensors	PCA, LDA	^[52]	[66]
22	Green Tea	To propose a multi-technology fusion system based on E-nose to evaluate pesticide residues in tea	Fox 4000 (ALPHA MOS, Toulouse, France) with 18 MOS sensors	PLS, SVM, ANN	^[53]	[67]
23	Fuyun 6 and Jinguanyin Black Tea	To investigate the aroma differences of tea produced from two different tea cultivars	E-nose (Shanghai Ongshen Intelligent Technology Co., Ltd., Shanghai, China) with 10 sensors	LDA, PCA, HCA, OPLS-DA	^[54]	[68]
24	Green Tea	To investigate the changes in volatile profiles of tea using different drying processes	Heracles II gas phase E-nose (Alpha M.O.S., Toulouse, France)	PLS-DA, PCA	^[55]	[69]
25	Dianhong Black Tea	To investigate the quality of tea infusions	Heracles II fast GC-E-Nose (Alpha M.O.S., Toulouse, France)	PLS-DA, FDA	^[56]	[70]
26	Oolong Tea	To discriminate between the smell of tea leaves during various stages of manufacturing process	E-nose with 12 MOS sensors (Figaro USA, Inc., Arlington Heights, IL, USA and Nissha FIS, Inc., Osaka, Japan)	LDA	^[57]	[71]
27	Shucheng Xiaolanhua Tea	To enhance the performance of tea quality detection	PEN3 (Airsense Analytics, Schwerin, Germany) with 10 MOS sensors	K(PCA), KECA, SVM	^[58]	[72]

Tea polyphenols, amino acids, and caffeine are responsible for forming the astringency and bitterness of tea. Even though many methods have been developed to evaluate tea’s taste, this task has always been challenging. In this regard, a rapid and feasible method was established using E-nose and mathematical modelling to identify the bitterness and astringent taste of green tea samples. The findings revealed that the BPNN model was more reliable than the PLSR and MLR models in examining the bitterness and astringency of tea infusions ^[41].

Processing technology is crucial in providing the distinctive flavor of black tea, including withering, rolling, fermentation, and drying processes. Yang et al. [63] employed E-nose to examine the volatile profile of Congou black tea, as well as the changes in the aroma features across the different variable-temperature final firing processes. The applied PLS-DA clearly differentiated the tea samples by different drying conditions.

Processing technology is crucial in providing the distinctive flavor of black tea, including withering, rolling, fermentation, and drying processes. Yang et al. ^[48] employed E-nose to examine the volatile profile of Congou black tea, as well as the changes in the aroma features across the different variable-temperature final firing processes. The applied PLS-DA clearly differentiated the tea samples by different drying conditions.