Classical Gas Identification Algorithms for Gas Sensor Array

Classical Gas Identification Algorithms for Gas Sensor Array: History

View Latest Version

Please note this is an old version of this entry, which may differ significantly from the current revision.

Contributor:

Electronic noses have been applied in various fields, such as food quality, environmental assessment, coal mine risk prediction, and disease diagnosis. Algorithms can be useful to enhance the performance of the electronic noses, such as correcting drift and predict the exact concentration of each gas in mixtures.

gas recognition algorithm
electronic noses
gas sensor array
PCA
SVM
ELM
RF
NBM
DT
KNN

1. Introduction

Over the years, a variety of classical gas recognition algorithms have been successfully applied in electronic nose systems ^[1]. Next, researchers will introduce these classical gas recognition algorithms and comprehensively compare and analyze them.

The basic working mechanism of the classical gas recognition algorithm is to design the features according to the waveform of the time series signal and discover the hidden deep structure through the features. Feature extraction of gas sensing data can obtain features with more resolution, abstractness, and invariance. Generally, the classical gas recognition algorithms mainly include the following six algorithms.

(1): Principal Component Analysis (PCA)

PCA is an unsupervised learning technique used to reduce the dimension of sample data, increase interpretability, and minimize information loss at the same time. In 2008, Sen et al. used PCA to distinguish 10 kinds of gaseous hydrogen sulfide (H₂S) with different concentrations, and the recognition accuracy was 100% ^[2]. In 2022, Khorramifar et al. constructed an experimental electronic nose device and combined it with PCA for the identification of grape varieties ^[3].

PCA is mainly used to reduce the dimension of data and reduce the computational cost of the algorithm, which can remove certain noise. However, PCA is used in unsupervised and linear cases, and it cannot distinguish the electronic nose data with categories and nonlinear data. To some extent, this method limits the application of the sensor array composed of metal oxide sensors, namely the electronic nose.

(2): Linear Discriminant Analysis (LDA)

LDA is a supervised learning technique that is used to project data into a low-dimensional space and ensure that the intra-class variance of each category is small while the mean difference between classes is large. In 2017, Choi et al. proposed an electronic nose gas classification data reconstruction method based on subspace analysis, designed an electronic nose system with stronger robustness to data errors, and enhanced the spatial discrimination ability of PCA plus LDA ^[4]. In 2022, Palacin et al. successfully identified complex aromas of caffeinated and decaffeinated espresso package types using LDA ^[5]. In the same year, Palacin et al. applied LDA to the electronic nose to classify two volatile organic compounds, ethanol and acetone ^[6].

In recent years, PCA and LDA, two data dimension reduction methods, have been successfully applied in gas identification to finally realize the classification and recognition of different gases. However, LDA may over-fit the data and eventually lead to a decline in gas identification accuracy.

(3): Support Vector Machine (SVM)

SVM is a supervised pattern recognition and machine learning method. It is a linear classifier defined on the feature space with the largest interval, which realizes the optimization of generalization ability under the condition of limited training samples . Gas recognition based on SVM is a mature theory and has been proven to be successful in many practical applications. In 2010, Pardo et al. applied SVM to the recognition of electronic nose data. Two separating hyperplanes are shown. The main idea of SVM is to use specific hyperplanes to separate different classes and maximize classification spacing. The interval refers to the distance from the classification hyperplane to the nearest point in the data set .

In 2021, Binston et al. applied the SVM method to the electronic nose system to detect lung cancer, chronic obstructive pulmonary disease (COPD), and other lung diseases through changes in volatile organic compounds (VOC) in exhaled gases ^[7]. In 2015, Smulko et al. successfully predicted gas concentration using a single gas sensor based on the LSSVM method. The LSSVM method does not need to remove data noise, smooth data, or other tedious data processing, which is an advantage of applying the LSSVM method to gas recognition. Researchers usually adopt two or more combined algorithms for better gas classification and recognition accuracy. For example, in 2018, Chen et al. combined PCA with SVM to monitor VOC produced in the ripening process of bananas to identify different rims of bananas, with the highest recognition accuracy of 97.14% ^[8]. In 2019, Shi et al. combined Convolutional Neural Network (CNN) with SVM to identify beer odor information and achieved a good classification performance of 96.67% in the test set ^[9]. In recent years, the SVM method has been successfully applied to electronic nose systems. SVM is a novel small-sample learning method with a solid theoretical basis; therefore, it can be widely used in small-sample electronic nose data. However, the SVM method is sensitive to missing data, affecting the accuracy of gas recognition.

(4): K-Nearest Neighbor (KNN)

KNN is a supervised learning algorithm. The KNN method is widely used in non-parametric statistical methods for classification and regression due to its simplicity and remarkable classification performance. In 2019, Schroeder et al. used KNN to classify several complex odors, including cheese, wine, and edible oil samples, with an identification accuracy of 91% ^[10]. In addition, some improved KNNs, such as Fuzzy K-Nearest Neighbor (F-KNN), are applied to gas recognition. In 2020, Mirzaee–Ghaleh et al. adopted the F-KNN algorithm to identify fresh and frozen chicken with an accuracy of 95.83% ^[11].

In addition, researchers often combine KNN with other algorithms to achieve better gas classification and recognition accuracy. For example, in 2018, Xu et al. used Kernel Principal Component Analysis (KPCA) to extract the characteristics of nonlinear gas mixtures of different components and combined it with KNN to recognize the target gases, with an accuracy of 98.33%. The gas recognition flow chart based on KPCA and KNN methods is shown in Figure 1. Firstly, the kernel matrix K is constructed from the training sample set. KPCA is used to extract the features of all the training samples to train the KNN classifier. Finally, the KNN algorithm is used to identify the features of test samples ^[12]. The KNN method can be applied to the classification of nonlinear data, and its principle is simple. However, when the data dimension is very high, the workload of computation is large. Samples that are close together may not belong to the same categ Crystals 13 00615 g006 550

Figure 1. Flow diagram of binary mixed gas recognition method based on KPCA-KNN in 2018 ^[12].

(5): Decision Tree (DT)

DT is a non-parametric supervised learning method. It is a kind of decision judgment model based on a tree structure. It classifies data sets through multiple condition discrimination processes and finally obtains the required results ^[13]. The most important feature of a Decision Tree Classifier (DTC) is that it can decompose a complex decision judgment into a series of simpler decisions that have good explanatory ability ^[14]. In 2016, He et al. used the Short-time Fourier Transform (STFT) feature extraction method combined with the DT method to classify carbon monoxide, methane, and ethanol gases of different concentrations. Considering that gas data usually contain more low-frequency information than high-frequency information, STFT is used to extract the low-frequency amplitude and is combined with a genetic algorithm to select the best features. Then, the decision tree classifier is used to achieve gas classification, and better classification results are obtained ^[15]. The time complexity of the DT algorithm is small and it can be used for electronic nose data with small samples. However, over-fitting occurs easily. For data with inconsistent sample numbers in different categories, the result of DT is biased to those features with more numerical values.

(6): Random Forest (RF)

RF employs multiple decision trees to train and predict samples. That is to say, the RF algorithm contains multiple decision trees, and the category of its output is determined by the many trees of the categories of individual decision tree output ^[16]. In 2020, Muhamad et al. adopted RF as a multi-classification technique to identify multiple gas by-products, eventually achieving 96.4% accuracy. As shown in Figure 2, multiple sets of data are obtained from the original training data, multiple classifiers are established, and finally, a group of classifiers is connected to build an effective combination classifier ^[17]. In 2022, Bogdal et al. adopted the random forest method to identify fire debris with or without gasoline, and the algorithm performed well. Compared with a convolutional neural network, the amount of training data and training time required by the random forest method are significantly less ^[18].

Figure 2. Flow diagram of random forest model. C* represents the symbolic representation of the combinatorial classifier ^[17].

RF can handle high-dimensional and unbalanced data well, in general. However, it may not produce a good classification for small-sample data. It is more complex than the decision tree algorithm, and the calculation cost is higher.

(7): Naive Bayes Model (NBM)

The Naive Bayes Model is a classification method based on Bayes’ theorem and the assumption of independence of feature conditions. It is a probabilistic model with a Directed Acyclic Graph (DAG) topology that is suitable for expressing and analyzing uncertain and probabilistic events ^[18]. As shown in Figure 3, the traditional NBM was improved using MDF theory and PCA, and the gas leakage identification model of gas extraction boreholes was established. The new classifier eliminated the shortcomings of the NBM that could not adapt to missing data and non-standard data and greatly improved the classification ability of the model ^[19].

Figure 3. Flow diagram of building the model in 2022 ^[19].

The NBM algorithm is simple and easy to implement, performs well on small-sample data, and can handle multiple classification tasks. However, prior probability shall be known, and it depends on the hypothesis in many cases. The hypothesis model can have many cases; therefore, the prediction effect will be poor in some cases due to the hypothesis prior model.

(8): Extreme Learning Machine (ELM)

ELM is a new fast-learning algorithm, which randomly initializes the input weight, and analyzes it to determine the output weight of the network. It has few training parameters, fast learning speed, strong generalization ability, and other advantages. In 2017, Jian et al. proposed Weighted Multiple Kernel Extreme Learning Machine (QWMK-ELM) on the basis of ELM and compared it with classical classification methods, such as ELM, KELM, KNN, SVM, and MLP. Experimental results show that the proposed QWMK-ELM is superior to the above methods, not only in terms of accuracy but also in terms of gas classification efficiency ^[20]. In 2017, Zhang et al. combined a Self-Expression Model (SEM) and ELM to identify outliers in the electronic nose response, and a large number of experimental results have proven the effectiveness of the proposed method ^[21].

In 2022, Wang et al. used SVM, ELM, and Back Propagation Neural Network (BPNN) to quantitatively analyze six types of VOC. Among them, the ELM algorithm model showed the best performance; the recognition accuracy was up to 99% in the five-fold cross-validation. The integrated model has good compatibility and scalability. Using the pipeline module in sklearn, a series of data operations contained in the pattern recognition in the electronic nose system are formed into a workflow for gas recognition.

ELM is a kind of feed-forward neural network with single-layer hidden nodes, wherein the parameters of hidden nodes are randomly assigned without tuning operation, and the output weights are usually learned in one step, which makes ELM classification more efficient ^[22]. The hidden layer of ELM does not need iteration and has a fast learning speed and good generalization performance. However, it only considers empirical risks rather than structural risks, which may lead to the problem of over-fitting and reduce the accuracy of gas identification.

2. Analysis and Comparison of Classical Gas Recognition Algorithms

In Table 1, researchers summarize and compare the properties of classical gas recognition algorithms for electronic nose systems. As can be seen from Table 1, the classical gas identification algorithms generally have fast training speeds and fine interpretability, though sensitive to missing data.

Table 1. Comparison of the classical traditional gas identification algorithms ^[1].

	PCA	LDA	SVM	KNN	DT	RF	NBM	ELM
Property	Unsupervised	Supervised	Supervised	Supervised	Supervised	Supervised	Unsupervised	Unsupervised
Training speed	Fast	Fast	Moderate	Moderate	Fast	Moderate	Moderate	Fast
Demand for data	Low	Low	Low	High	Low	High	Moderate	Low
Robustness for noise	Moderate	Moderate	Low	High	Moderate	High	Low	Low
Sensitive to missing data	Low	Low	Moderate	Low	Low	Moderate	Low	Moderate
Interpretability	Moderate	Moderate	High	High	Moderate	High	Moderate	Moderate

It can be known that optimal gas recognition algorithms can be selected according to the characteristics of sensor signal data when carrying out gas recognition experiments. It is found that, in view of different gas recognition scenarios, improved classical gas recognition algorithms can achieve better gas recognition accuracy. For specific gas recognition scenarios, the efficient combination of two or more algorithms can realize the accurate recognition of different types of gas. Many classical gas recognition algorithms, such as KNN and SVM, have relatively fixed frames and few parameters; therefore, their model generalization ability is not strong. As another example, the PCA method usually requires complex feature engineering and dimensionality reduction of data; therefore, the steps are complicated, and the application is limited. Moreover, in a complex real environment, the air humidity and temperature are often not controlled; therefore, the accuracy of the classical gas identification algorithm is greatly affected by the air temperature and humidity.

Since gas sensor data are usually represented as time series signals, it is necessary to artificially design features according to the waveform of time series signals. Moreover, for classical gas recognition algorithms, the quality of feature extraction will directly affect the accuracy of final classification results, which leads to greater difficulty in feature extraction of classical gas recognition algorithms and poor algorithm universality. Research shows that the recognition accuracy of traditional gas recognition algorithms (for example, SVM, KNN, etc.) is lower than that of gas recognition algorithms based on neural networks (such as CNN, DCNN) for the same gas to be identified.

At the same time, because the real measurement environment of the electronic nose is very different, the change in ambient temperature and humidity will affect the response of the sensors. After investigation, the classical gas recognition algorithm cannot solve the problem of sensor drift very well.

This entry is adapted from the peer-reviewed paper 10.3390/cryst13040615

References

Persaud, K.; Dodd, G. Analysis of discrimination mechanisms in the mammalian olfactory system using a model nose. Nature 1982, 299, 352–355.
Liu, H.; Meng, G.; Deng, Z.; Li, M.; Chang, J.; Dai, T.; Fang, X. Progress in Research on VOC Molecule Recognition by Semiconductor Sensors. Acta Phys. Chim. Sin. 2022, 38, 2008018.
Meng, F.; Li, X.; Yuan, Z.; Lei, Y.; Qi, T.; Li, J. Ppb-Level Xylene Gas Sensors based on Co3O4 Nanoparticles coated Reduced Graphene Oxide (rGO) Nanosheets Operating at Low Temperature. IEEE Trans. Instrum. Meas. 2021, 70, 9511510.
Ji, H.; Qin, W.; Yuan, Z.; Meng, F. Qualitative and quantitative recognition method of drug-producing chemicals based on SnO2 gas Sensor with dynamic measurement and PCA weak separation. Sens. Actuators B Chem. 2021, 348, 130698.
Qin, W.; Yuan, Z.; Gao, H.; Zhang, R.; Meng, F. Perovskite-structured LaCoO3 modified ZnO gas sensor and investigation on its gas sensing mechanism by first principle. Sens. Actuators B Chem. 2021, 341, 130015.
Meng, F.; Shi, X.; Yuan, Z.; Ji, H.; Qin, W.; Shen, Y.; Xing, C. Detection of Four Alcohol Homologue Gases by ZnO Gas Sensor in Dynamic Interval Temperature Modulation Mode. Sens. Actuators B Chem. 2022, 350, 130867.
Meng, F.; Qi, T.; Zhang, J.; Zhu, H.; Yuan, Z.; Liu, C.; Qin, W.; Ding, M. MoS2-templated porous hollow MoO3 microspheres for highly selective ammonia sensing via a Lewis acid-base interaction. IEEE Trans. Ind. Electron. 2022, 69, 960–970.
Jiao, M.; Chen, X.; Hu, K.; Qian, D.; Zhao, X.; Ding, E. Recent developments of nanomaterials-based conductive type methane sensors. Rare Met. 2021, 40, 1515–1527.
Navaneeth, B.; Suchetha, M. PSO optimized 1-D CNN-SVM architecture for real-time detection and classification applications. Comput. Biol. Med. 2019, 108, 85–92.
Guntner, A.T.; Abegg, S.; Konigstein, K.; Gerber, P.A.; Schmidt-Trucksass, A.; Pratsinis, S.E. Breath sensors for health monitoring. ACS Sens. 2019, 4, 268–280.
Das, S.; Pal, M. Non-invasive monitoring of human health by exhaled breath analysis: A comprehensive review. J. Electrochem. Soc. 2020, 167, 037562.
Tai, H.; Wang, S.; Duan, Z.; Jiang, Y. Evolution of breath analysis based on humidity and gas sensors: Potential and challenges. Sens. Actuators B Chem. 2020, 318, 128104.
Paleczek, A.; Rydosz, A. Review of the algorithms used in exhaled breath analysis for the detection of diabetes. J. Breath Res. 2022, 16, 026003.
Paknahad, M.; Ahmadi, A.; Rousseau, J.; Nejad, H.R.; Hoorfar, M. On-chip electronic nose for wine tasting: A digital microfluidic approach. IEEE Sens. J. 2017, 17, 4322–4329.
Hidayat, S.N.; Triyana, K.; Fauzan, I.; Julian, T.; Lelono, D.; Yusuf, Y.; Ngadiman, N.; Vesolo, A.C.A.; Peres, A.M. The electronic nose coupled with chemometric tools for discriminating the quality of black tea samples in situ. Chemosensors 2019, 7, 29.
Pulluri, K.K.; Kumar, V.N. Development of an Integrated Soft E-nose for Food Quality Assessment. IEEE Sens. J. 2022, 22, 15111–15122.
Lamagna, A.; Reich, S.; Rodríguez, D.; Boselli, A.; Cicerone, D. The use of an electronic nose to characterize emissions from a highly polluted river. Sens. Actuators B Chem. 2008, 131, 121–124.
Ma, H.; Wang, T.; Li, B.; Cao, W.; Zeng, M.; Yang, J.; Su, Y.; Hu, N.; Zhou, Z.; Yang, Z. A low-cost and efficient electronic nose system for quantification of multiple indoor air contaminants utilizing HC and PLSR. Sens. Actuators B Chem. 2022, 350, 130768.
Muhamad, N.A.; Musa, I.V.; Malek, Z.A.; Mahdi, A.S. Classification of partial discharge fault sources on SF₆ insulated switchgear based on twelve by-product gases random forest pattern recognition. IEEE Access 2020, 8, 212659–212674.
Liu, M.; Li, Y. Application of electronic nose technology in coal mine risk prediction. Chem. Eng. Trans. 2018, 68, 307–312.
Pan, H.; He, S.; Zhang, T.; Song, S.; Wang, K. Application of an improved naive Bayesian analysis for the identification of air leaks in boreholes in coal mines. Sci. Rep. 2022, 12, 16081.
Comito, C.; Pizzuti, C. Artificial intelligence for forecasting and diagnosing COVID-19 pandemic: A focused review. Artif. Intell. Med. 2022, 128, 102286.
Hidayat, S.N.; Julian, T.; Dharmawan, A.B.; Puspita, M.; Chandra, L.; Rohman, A.; Julia, M.; Rianjanu, A.; Nurputra, D.K.; Triyana, K.; et al. Hybrid learning method based on feature clustering and scoring for enhanced COVID-19 breath analysis by an electronic nose. Artif. Intell. Med. 2022, 129, 102323.
Nurputra, D.K.; Kusumaatmaja, A.; Hakim, M.S.; Hidayat, S.N.; Julian, T.; Sumanto, B.; Mahendradhata, Y.; Saktiawati, A.M.; Wasisto, H.S.; Triyana, K. Fast and noninvasive electronic nose for sniffing out COVID-19 based on exhaled breath-print recognition. NPJ Digit. Med. 2022, 5, 115.

© Text is available under the terms and conditions of the Creative Commons Attribution (CC BY) license; additional terms may apply. By using this site, you agree to the Terms and Conditions and Privacy Policy.