Air quality models simulate the atmospheric environment systems and provide increased domain knowledge and reliable forecasting. They provide early warnings to the population and reduce the number of measuring stations. Due to the complexity and non-linear behavior associated with air quality data, soft computing models became popular in air quality modeling (AQM). This study critically investigates, analyses, and summarizes the existing soft computing modeling approaches. Among the many soft computing techniques in AQM, this article reviews and discusses artificial neural network (ANN), support vector machine (SVM), evolutionary ANN and SVM, the fuzzy logic model, neuro-fuzzy systems, the deep learning model, ensemble, and other hybrid models. Besides, it sheds light on employed input variables, data processing approaches, and targeted objective functions during modeling. The discussion in this paper will help to determine the suitability and appropriateness of a particular model for a specific modeling context.
Among many potential techniques, different variations of artificial neural networks, evolutionary fuzzy and neuro-fuzzy models, ensemble and hybrid models, and knowledge-based models should be further explored. Besides, there is a continuous need for the development of a universal model, as most of the explored models are either site-dependent or pollutant dependent. This section discusses future research directions and potential soft computing models that can be investigated in air quality modeling throughout the world.
As can be observed from Section 3, ANN approaches were widely explored in AQM and in most cases MLP-NN, BP-NN, RBF-NN, or R-NN were employed. Other available variations of ANN (GR-NN, GC-NN, P-NN, W-NN, and others) models that successfully demonstrated their capabilities in modeling complex and non-linear problems in other engineering fields have not been explored significantly . Many of them (extreme learning machine, multitasking, probabilistic, time delay, modular, and other hybrid neural networks) are rarely explored. Besides, deep neural network models received great attention in modeling PM2.5 concentrations, but other air pollutants have not been modeled significantly. Therefore, such unexplored and rarely explored variations of the neural networks can be investigated in future works for modeling all types of air pollutant concentrations.
Fuzzy systems are the proven tools for many applications for modeling complex and non-linear problems. However, the lack of learning capabilities in the fuzzy systems has encouraged researchers to augment their capabilities by hybridizing them with the EO techniques . Among the many EO techniques, GA, GWO, CSA, SCA, and PSO are widely used and well-known global search optimization approaches with the ability to explore a large search space for suitable solutions . Besides, the type-2 fuzzy set is capable of handling more uncertainties than the type-1 fuzzy set that has been successfully applied in a wide range of areas . Therefore, considering the potentiality of the fuzzy logic approaches, these can be explored in the field of AQM.
Long-term research in the field of neural networks and advanced statistical methods has contributed to the evolution of an abductory induction mechanism that is known as GMDH . It automatically synthesizes abductive networks from a database of inputs and outputs with complex and nonlinear relationships. Other extensions of the neural network models include the functional network models (FNM) . This determines the structure of a network and data using domain knowledge and estimates unknown neuron functions. Both GMDH and FNM were explored in many relevant applications . These rarely explored extensions of the neural networks can be further investigated in AQM.
Case-based reasoning solves new problems by recalling the experiences and solutions of similar past problems . It deals with the given problems following four steps, namely retrieve, reuse, revise, and retain . Another soft computing technique, the knowledge-based system, attempts to solve problems by giving advice in a domain and utilizing the knowledge provided by a human expert . Researchers have employed both techniques to solve many complex problems . These techniques can be investigated in AQM, as none of them have yet been explored.
As discussed earlier, ensemble models employ multiple learning techniques in parallel and combine their outputs to produce a better generalization performance. In a real-world situation, they aim to manage the strengths and weaknesses of each model and end up with the best possible solutions . Recently, such models received huge momentum in modeling AQM, but this was limited to a few specific pollutants (mainly PM2.5). Researchers should invest more time into these attractive tools as they will become some of the most prominent tools for AQM in the future.
Most of the discussed models are either site dependent or pollutant dependent. There is no guarantee that a specific model developed for a specific site will be stable and reliable for another location with different meteorological conditions. Therefore, there is always a need for the development of a universal model for AQM. Besides, the comparison between the site-specific models could be an attractive option for future research as it aids in developing site characterizations. Such research may enable the creation of guidelines for site-specific model development.
As discussed in Section 2, several approaches have been reported to reduce the input space by selecting the most dominant input variables. In addition, most of the approaches selected air pollutant and meteorological data as inputs. A few of the considered other types of data, including temporal, traffic, geographical, and sustainable data. Therefore, the present authors believe that the comparison of such input selection methods considering all available input data types could be an attractive field of research in AQM. Besides, the selection of proper decomposition components for the reduction of data dimensionality could be considered as another potential research direction, as the inclusion of many components in input space may result in model complexity and the accumulation of errors. Moreover, other available data pre-processing and feature extraction techniques employed for relevant fields could also be explored.
Soft computing models have become very popular in air quality modeling as they can efficiently model the complexity and non-linearity associated with air quality data. This article critically reviewed and discussed existing soft computing modeling approaches. Among the many available soft computing techniques, the artificial neural networks with variations of structures and the hybrid modeling approaches combining several techniques were widely explored in predicting air pollutant concentrations throughout the world. Other approaches, including support vector machines, evolutionary artificial neural networks and support vector machines, fuzzy logic, and neuro-fuzzy systems, have also been used in air quality modeling for several years. Recently, deep learning and ensemble models have received huge momentum in modeling air pollutant concentrations due to their wide range of advantages over other available techniques. Additionally, this research reviewed and listed all possible input variables for air quality modeling. It also discussed several input selection processes, including cross-correlation analysis, principal component analysis, random forest, learning vector quantization, rough set theory, and wavelet decomposition techniques. Besides, this article sheds light on several data recovery approaches for missing data, including linear interpolation, multivariate imputation by chained equations, and expectation-maximization imputation methods.
Finally, it proposed many advanced, reliable, and self-organizing soft computing models that are rarely explored and/or not explored in the field of air quality modeling. For instance, functional neural network models, variations of neural network models, evolutionary fuzzy and neuro-fuzzy systems, type-2 fuzzy logic models, group method data handling, case-based reasoning, ensemble, and hybrid models, and knowledge-based systems have the immense potential for modeling air pollutant concentrations. Moreover, the modelers can compare the effectiveness of several input selection processes to find the most suitable one for air quality modeling. Furthermore, they can attempt to build universal models instead of developing site-specific and pollutant-specific models. The authors believe that the findings of this review article will help researchers and decision-makers in determining the suitability and appropriateness of a particular model for a specific modeling context.