Document Type : ORIGINAL RESEARCH ARTICLE
Author
Department of System Analysis and Decision Making, Ural Federal University, Ekaterinburg, Russian Federation
Abstract
In view of pollution prediction modeling, the study adopts homogenous (random forest, bagging, and additive regression) and heterogeneous (voting) ensemble classifiers to predict the atmospheric concentration of Sulphur dioxide. For model validation, results were compared against widely known single base classifiers such as support vector machine, multilayer perceptron, linear regression and regression tree using M5 algorithm. The prediction of Sulphur dioxide was based on atmospheric pollutants and meteorological parameters. While, the model performance was assessed by using four evaluation measures namely Correlation coefficient, mean absolute error, root mean squared error and relative absolute error. The results obtained suggest that 1) homogenous ensemble classifier random forest performs better than single base statistical and machine learning algorithms; 2) employing single base classifiers within bagging as base classifier improves their prediction accuracy; and 3) heterogeneous ensemble algorithm voting have the capability to match or perform better than homogenous classifiers (random forest and bagging). In general, it demonstrates that the performance of ensemble classifiers random forest, bagging and voting can outperform single base traditional statistical and machine learning algorithms such as linear regression, support vector machine for regression and multilayer perceptron to model the atmospheric concentration of sulphur dioxide.
Graphical Abstract
Highlights
- Random Forest algorithm performs better than traditional independent learners MLP and SVM;
- Adopting single base classifiers such as SVM, MLP, etc. within Bagging (homogenous ensemble learning classifier) enhances their prediction accuracy;
- Voting – a heterogeneous ensemble learning classifier has the characteristics of predicting SO2 concentration with high accuracy and low error values.
Keywords
- Air pollution modeling
- Ensemble learning techniques
- Multilayer Perceptron (MLP)
- Random forest
- Bagging
- Sulphur dioxide (SO2)
- Support Vector Machine (SVM)
- Voting
Main Subjects
Letters to Editor
[1] Letters that include statements of statistics, facts, research, or theories should include appropriate references, although more than three are discouraged.
[2] Letters that are personal attacks on an author rather than thoughtful criticism of the author’s ideas will not be considered for publication.
[3] Letters can be no more than 300 words in length.
[4] Letter writers should include a statement at the beginning of the letter stating that it is being submitted either for publication or not.
[5] Anonymous letters will not be considered.
[6] Letter writers must include their city and state of residence or work.
[7] Letters will be edited for clarity and length.
Send comment about this article