Department of System Analysis and Decision Making, Ural Federal University, Ekaterinburg, Russian Federation


In view of pollution prediction modeling, the study adopts homogenous (random forest, bagging, and additive regression) and heterogeneous (voting) ensemble classifiers to predict the atmospheric concentration of Sulphur dioxide. For model validation, results were compared against widely known single base classifiers such as support vector machine, multilayer perceptron, linear regression and regression tree using M5 algorithm. The prediction of Sulphur dioxide was based on atmospheric pollutants and meteorological parameters. While, the model performance was assessed by using four evaluation measures namely Correlation coefficient, mean absolute error, root mean squared error and relative absolute error. The results obtained suggest that 1) homogenous ensemble classifier random forest performs better than single base statistical and machine learning algorithms; 2) employing single base classifiers within bagging as base classifier improves their prediction accuracy; and 3) heterogeneous ensemble algorithm voting have the capability to match or perform better than homogenous classifiers (random forest and bagging). In general, it demonstrates that the performance of ensemble classifiers random forest, bagging and voting can outperform single base traditional statistical and machine learning algorithms such as linear regression, support vector machine for regression and multilayer perceptron to model the atmospheric concentration of sulphur dioxide.

Graphical Abstract

Application of ensemble learning techniques to model the atmospheric concentration of SO2


  • Random Forest algorithm performs better than traditional independent learners  MLP and SVM;
  • Adopting single base classifiers such as SVM, MLP, etc. within Bagging (homogenous ensemble learning classifier) enhances their prediction accuracy;
  • Voting – a heterogeneous ensemble learning classifier has the characteristics of predicting SO2 concentration with high accuracy and low error values.


Main Subjects

Abdul-Wahab, S.A.; Al-Alawi, S.M.. (2002). Assessment and prediction of tropospheric ozone concentration levels using artificial neural networks. Environ Modell Software, 17: 219-228 (10 Pages).

Alfaro, E.; Garcı́a, N.; Gámez, M.; Elizondo, D., (2008). Bankruptcy forecasting: An empirical comparison of AdaBoost and neural networks. Decis. Support Syst., 45: 110-122 (13 pages).

Baawain, M.S.; Al-Serihi, A.S., (2014). Systematic approach for the prediction of ground-level air pollution (around an industrial port) using an artificial neural network. Aerosol Air Qual. Res., 14: 124-134 (11 pages).

Balasubramanian, V.; Ho, S.S.; Vovk, V.,(2014). Conformal prediction for relaible machine learning: theory, adoptions and applications. Newnes

Bedoui, S.; Gomri, S.; Samet, H.; Kachouri, A., (2016). A prediction distribution of atmospheric pollutants using support vector machines, discriminant analysis and mapping tools (Case study: Tunisia). Pollution, 2: 11-23 (13 pages).

Brunekreef, B. and. (2002). Air pollution and health. The Lancet, 360(9341): 1233-1242 (10 pages).

Brunelli, U.; Piazza, V.; Pignato, L.; Sorbello, F.; Vitabile, S., (2007). Two-days ahead prediction of daily maximum concentrations of SO2, O3, PM10, NO2, CO in the urban area of Palermo, Italy. Atmos. Environ., 41, 2967-2995 (29 pages).

Cannon, A.J.; Lord, E.R., (2000). Forecasting summertime surface-level ozone concentrations in the Lower Fraser Valley of British Columbia: An ensemble neural network approach. J. Air Waste Manage. Assoc., 50: 322-339 (18 pages).

Capilla, C., (2014). Multilayer perceptron and regression modeling to forecast hourly nitrogen dioxide concentrations. WIT Trans. Ecol. Environ., 183: 39-48 (10 pages).

Chaloulakou, A.; Saisana, M.; Spyrellis, N., (2003). Comparative assessment of neural networks and regression models for forecasting summertime ozone in Athens. Sci. Total Environ., 313: 1-13 (13 pages).

Elangasinghe, M.A.; Singhal, N.; Dirks, K.N.; Salmond, J.A.; Samarasinghe, S., (2014). Complex time series analysis of PM10 and PM2. 5 for a coastal site using artificial neural network modeling and k-means clustering. Atmos. Environ., 94: 106-116 (11 pages).

Fathima, A.; Mangai, J.A.; Gulyani, B.B., (2014). An ensemble method for predicting biochemical oxygen demand in river water using data mining techniques. Int. J. River Basin Manage., 12: 357-366 (10 pages).

Gabralla, L.A.; Abraham, A., (2014). Prediction of oil prices using bagging and random subspace. Proceedings of the Fifth International Conference on Innovations in Bio-Inspired Computing and Applications, 343-354 (12 pages).

Gardner, M.W.; Dorling, S.R., (1999). Neural network modeling and prediction of hourly NOx and NO2concentrations in urban air in London. Atmos. Environ., 33: 709-719 (11 pages).

Grivas, G., and Chaloulakou, A. (2006). Artificial neural network models for prediction of PM10 hourly concentrations, in the Greater Area of Athens, Greece. Atmos. Environ., 40: 1216-1229 (14 pages).

Jiang, N.; Riley, M.L., (2015). Exploring the utility of the random forest method for forecasting ozone pollution in Sydney. J. Environ. Protect. Sustainable develop., 1: 245-254 (12 pages).

Juhos, I.; Makra, L.; Tóth, B., (2008). Forecasting of traffic origin NO and NO2 concentrations by Support Vector Machines and neural networks using Principal Component Analysis. Simul. Mdel. Prat. Theory., 16: 1488-1502 (15 pages).

Lu, W.Z.; Fan, H.Y.; Lo, S.M., (2003). Application of evolutionary neural network method in predicting pollutant levels in downtown area of Hong Kong. Neurocomputing, 51: 387-400 (14 pages).

Lu, W.Z.; Wang, W.J.; Wang, X.K.; Xu, Z.B.; Leung, A.Y., (2003). Using improved neural network model to analyze RSP, NO x and NO 2 levels in urban air in Mong Kok, Hong Kong. Environ. Monit. Assess., 87: 235-254 (20 pages).

Lu, W.Z.; Wang, D., (2014). Learning machines: Rationale and application in ground-level ozone prediction. Appl. Soft. Comput., 24: 135-141 (7 pages).

Masih, A., (2018a). Thar Coalfield: Sustainable Development and an Open Sesame to the energy security of Pakistan. IOP Conference Series: Journal of Physics, 989 (1): 012004 (8 pages).

Masih, A., (2018b). Modeling the atmospheric concentration of Carbon monoxide by using Ensemble Learning Techniques. Proceedings of the 5th International Young Scientists Conference on Information Technologies, Telecommunications and Control Systems, 2298: 12 (8 pages).

Nawahda, A., (2016). An assessment of adding value of traffic information and other attributes as part of its classifiers in a data mining tool set for predicting surface ozone levels. Process Saf. Environ. Prot., 99: 149-158 (10 pages).

Palani, S.; Liong, S.Y.; Tkalich, P., (2008). An ANN application for water quality forecasting. Mar. Pollut. Bull., 56: 1586-1597 (12 pages).

Rahimi, A., (2017). Short-term prediction of NO2 and NOx concentrations using multilayer perceptron neural network: a case study of Tabriz, Iran. Ecol Processes., 6(4) (9 pages).

Riga, M.; Tzima, F.A.; Karatzas, K.; Mitkas, P.A., (2009). Development and evaluation of data mining models for air quality prediction in Athens, Greece. Inf. Technol. Environ. Eng.,  331-344 (14 pages).

Russo, A.; Soares, A.O., (2014). Hybrid model for urban air pollution forecasting: A stochastic spatio-temporal approach. Math. Geosci., 46: 75-93 (19 pages).

 Salnikov, V.G.; Karatayev, M.A., (2011). Impact of air pollution on human health: Focusing on Rudnyi Altay industrial area. Am. J. Environ. Sci., 7(3), 286-294 (9 pages).

Samoli, E.; Atkinson, R.W.; Analitis, A.; Fuller, G.W.; Green, D.C.; Mudway, I.; Anderson, H.R.; Kelly, F.J., (2016). Association of short term exposure to traffic-related air pollution with cardiovascular and respiratory hospital admissions in London, UK. Occup. Environ. Med., 73: 300-307 (8 pages).

Schlink, U.; Dorling, S.; Pelikan, E.; Nunnari, G.; Cawley, G.; Junninen, H.; Greig, A.; Foxall, R.; Eben, K.; Chatterton, T.; Vondracek, J.; Richter, M.; Dostal, M.; Bertucco, L.; Kolehmainen, L.; Doyle, M., (2003). A rigorous inter-comparison of ground-level ozone predictions. Atmos. Environ., 37: 3237-3253 (17 pages).

Seinfeld, J.H., (1998). Atmospheric chemistry and physics: from air pollution to climate change. Phys. Today, 51: 88 (13 pages).

Shaban, K.B.: Kadri, A.; Rezk, E., (2016). Urban air pollution monitoring system with forecasting models. IEEE Sens. J., 16: 2598-2606 (9 pages).

Singh, K.P.; Gupta, S.; Rai, P., (2013). Identifying pollution sources and predicting urban air quality using ensemble learning methods. Atmos. Environ., 80: 426-437 (12 pages).

Singh, K.P.; Gupta, S.; Kumar, A.; Shukla, S.P., (2012). Linear and nonlinear modeling approaches for urban air quality prediction. Sci. Total Environ., 426: 244-255 (12 pages).

Tüfekci, P., (2014). Prediction of full load electrical power output of a base load operated combined cycle power plant using machine learning methods. Int. J. Electr. Power Energy Syst., 60: 126-140 (15 pages).

Loon, M.J.W.; Vautard, R.; Schaap, M.; Bergström, R.; Bessagnet, B.; Brandt, J.;  Builtjes, P.J.H.; Christensen, J.H.; Cuvelier, C.; Graff, A.; Jonson, J.E.; Krol, M.; Langner, J.; Roberts, P.; Rouil, L.; Stern, R.; Tarrasón, L.; Thunis, P.; Vignati, E.; White, L.; Wind, P., (2007). Evaluation of long-term ozone simulations from seven regional air quality models and their ensemble. Atmos. Environ., 41, 2083-2097 (15 pages).

Wang, D.; Lu, W.Z., (2006). Interval estimation of urban ozone level and selection of influential factors by employing automatic relevance determination model. Chemosphere, 62: 1600-1611 (12 pages).

Wang, W.; Men, C.; Lu, W., (2008). Online prediction model based on support vector machine. Neurocomputing, 71: 550-558 (19 pages).

WHO, (2014). WHO's ambient air pollution database Update 2014.

Windeatt, T., (2008). Ensemble MLP classifier design. Comput. Intell. Paradigms., 133-147 (15 pages).

Xie, Y.; Zhao, L.; Xue, J.; Hu, Q.; Xu, X.; Wang, H., (2016). A cooperative reduction model for regional air pollution control in China that considers adverse health effects and pollutant reduction costs. Sci. Total Environ., 573: 458-469 (12 pages).

Yang, P.; Hwa Yang, Y.; B Zhou, B.; Y Zomaya, A., (2010). A review of ensemble methods in bioinformatics. Curr. Bioinf., 5: 296-308 (13 pages).

Yu, R.; Yang, Y.; Yang, L.; Han, G.; Move, O.A., (2016). Raq--a random forest approach for predicting air quality in urban sensing systems. Sensors, 16: 86 (18 pages).

Zhan, Y.A., (2018). Spatiotemporal prediction of daily ambient ozone levels across China using random forest for human exposure assessment. Environ. Pollut., 233: 464--473 (10 pages). 

Letters to Editor

GJESM Journal welcomes letters to the editor for the post-publication discussions and corrections which allows debate post publication on its site, through the Letters to Editor. Letters pertaining to manuscript published in GJESM should be sent to the editorial office of GJESM within three months of either online publication or before printed publication, except for critiques of original research. Following points are to be considering before sending the letters (comments) to the editor.

[1] Letters that include statements of statistics, facts, research, or theories should include appropriate references, although more than three are discouraged.
[2] Letters that are personal attacks on an author rather than thoughtful criticism of the author’s ideas will not be considered for publication.
[3] Letters can be no more than 300 words in length.
[4] Letter writers should include a statement at the beginning of the letter stating that it is being submitted either for publication or not.
[5] Anonymous letters will not be considered.
[6] Letter writers must include their city and state of residence or work.
[7] Letters will be edited for clarity and length.