Document Type : REVIEW PAPER


Department of System Analysis and Decision Making, Ural Federal University, Ekaterinburg, Russian Federation


Modern studies in the field of environment science and engineering show that deterministic models struggle to capture the relationship between the concentration of atmospheric pollutants and their emission sources. The recent advances in statistical modeling based on machine learning approaches have emerged as solution to tackle these issues. It is a fact that, input variable type largely affect the performance of an algorithm, however, it is yet to be known why an algorithm is preferred over the other for a certain task. The work aims at highlighting the underlying principles of machine learning techniques and about their role in enhancing the prediction performance. The study adopts, 38 most relevant studies in the field of environmental science and engineering which have applied machine learning techniques during last 6 years. The review conducted explores several aspects of the studies such as: 1) the role of input predictors to improve the prediction accuracy; 2) geographically where these studies were conducted; 3) the major techniques applied for pollutant concentration estimation or forecasting; and 4) whether these techniques were based on Linear Regression, Neural Network, Support Vector Machine or Ensemble learning algorithms. The results obtained suggest that, machine learning techniques are mainly conducted in continent Europe and America. Furthermore a factorial analysis named multi-component analysis performed show that pollution estimation is generally performed by using ensemble learning and linear regression based approaches, whereas, forecasting tasks tend to implement neural networks and support vector machines based algorithms.

Graphical Abstract

Machine learning algorithms in air quality modeling


  • Studies dedicated to estimation modeling are 1.5 times more than that of forecast modeling.
  • Estimation based studies mainly apply ensemble learning and regression algorithms, whereas forecasting tasks are tend to use NN and SVM based approaches.
  • Predictive features like land use and satellite images have a strong association with estimation models, but their correlation with forest models is weak.
  • Ensemble learning are highly reliable techniques with an average correlation coefficient equal to 0.79 but their applications in forecast modeling are limited. 


Letters to Editor

GJESM Journal welcomes letters to the editor for the post-publication discussions and corrections which allows debate post publication on its site, through the Letters to Editor. Letters pertaining to manuscript published in GJESM should be sent to the editorial office of GJESM within three months of either online publication or before printed publication, except for critiques of original research. Following points are to be considering before sending the letters (comments) to the editor.

[1] Letters that include statements of statistics, facts, research, or theories should include appropriate references, although more than three are discouraged.
[2] Letters that are personal attacks on an author rather than thoughtful criticism of the author’s ideas will not be considered for publication.
[3] Letters can be no more than 300 words in length.
[4] Letter writers should include a statement at the beginning of the letter stating that it is being submitted either for publication or not.
[5] Anonymous letters will not be considered.
[6] Letter writers must include their city and state of residence or work.
[7] Letters will be edited for clarity and length.