Discovering behavioral patterns among air pollutants: A data mining approach


  • Diana Arce Universidad del Azuay
  • Fernando Lima Universidad del Azuay
  • Marcos Patricio Orellana Cordero Universidad del Azuay
  • John Ortega Universidad del Azuay
  • Chester Sellers Universidad del Azuay
  • Patricia Ortega Universidad del Azuay



air pollutant; knowledge; data mining; correlation;


Air pollutants affect both human health and the environment. For this reason, environmental managers and urban planners focus their efforts in monitoring air pollution. In this context, complete information is required to support the decision-making process to improve the quality of life in urban zones. Hence, it is important to extract knowledge not only on concentration levels but associations between air pollutants. Based on the Cross-industry standard process for data mining, this paper presents an approach which leads to identify correlations and incidence between the most harmful pollutants in the Andean Region: Ozone, Carbon monoxide, Sulfur dioxide, Nitrogen dioxide and, Particulate material. This paper describes an experiment using a real dataset from a monitoring station in Cuenca, Ecuador located in the Andean region.  The results show that the proposed approach is effective to extract knowledge useful to support the evaluation of air quality in urban zones. In addition, this approach provides a starting point for future data mining applications for the analysis of air pollution in the context of the Andean region.



Download data is not yet available.


Cagliero, L., Cerquitelli, T., Chiusano, S., Garza, P., and Ricupero, G. (2016). Discovering Air Quality Patterns in Urban Environments. En Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct (pp. 25–28). New York, NY, USA: ACM.
Clima CUENCA: Temperatura, Climograma y Tabla climática para CUENCA - (s. f.). Recuperado 16 de julio de 2018, de
Doreswamy, G. O., and Manjaunath, B. (2015). Air pollution clustering using K-means algorithm in smart city. International Journal of Innovative Research in Computer and Communication Engineering, 3, 51–57.
Doreswamy, Ghoneim, O., and Manjaunath, B. R. (2015). Air Pollution Clustering Using K-Means Algorithm in Smart City. En International Journal of Innovative Research in Computer and Communication Engineering (Vol. Vol. 3, Special Issue 7).
Du, X., and Varde, A. S. (2016). Mining PM2.5 and traffic conditions for air quality. En 2016 7th International Conference on Information and Communication Systems (ICICS) (pp. 33-38).
Fukuda, K. (2007). Noise Reduction Approach for Decision Tree Construction: A Case Study of Knowledge Discovery on Climate and Air Pollution. En 2007 IEEE Symposium on Computational Intelligence and Data Mining (pp. 697-704).
Gao, B. J., Tung, R., and Yang, Y. (2017). Iterative matrix correlation for bisection clustering. En 2017 IEEE International Conference on Big Data (Big Data) (pp. 80-87).
Kampa, M., and Castanas, E. (2008). Human health effects of air pollution. Environmental Pollution, 151(2), 362-367.
Katz, M. (1970). Photochemical reactions of atmospheric pollutants. The Canadian Journal of Chemical Engineering, 48(1), 3-11.
Kim, K.-H., Choi, Y.-J., and Kim, M.-Y. (2005). The exceedance patterns of air quality criteria: a case study of ozone and nitrogen dioxide in Seoul, Korea between 1990 and 2000. Chemosphere, 60(4), 441-452.
Kingsy, G. R., Manimegalai, R., Geetha, D. M., Rajathi, S., Usha, K., and Raabiathul, B. N. (2016). Air pollution analysis using enhanced K-Means clustering algorithm for real time sensor data. En Region 10 Conference (TENCON), 2016 IEEE (pp. 1945–1949). IEEE.
Kumar, P., and Wasan, S. K. (2010). Analysis of X-means and global k-means USING TUMOR classification. En 2010 The 2nd International Conference on Computer and Automation Engineering (ICCAE) (Vol. 5, pp. 832-835).
Li, H., Fan, H., and Mao, F. (2016). A Visualization Approach to Air Pollution Data Exploration—A Case Study of Air Quality Index (PM2.5) in Beijing, China. Atmosphere, 7(3), 35.
Quick-R: Correlations. (s. f.). Recuperado 24 de julio de 2018, de
Select by Weights - RapidMiner Documentation. (s. f.). Recuperado 17 de julio de 2018, de
Shazan, M., Jabbar, M., Zaïane, O. R., and Osornio-Vargas, A. (2017). Discovering Spatial Contrast and Common Sets with Statistically Significant Co-location Patterns. En Proceedings of the Symposium on Applied Computing (pp. 796–803). New York, NY, USA: ACM.
Souza, F. T., and Rabelo, W. S. (2015). A data mining approach to study the air pollution induced by urban phenomena and the association with respiratory diseases. En 2015 11th International Conference on Natural Computation (ICNC) (pp. 1045-1050).
Wagner, E. (1994). Impacts on air pollution in urban areas. Environmental Management, 18(5), 759-765.
Walden, S., and Andrew, C. (2013). Publicación de los contaminantes atmosféricos de la estación de monitoreo en tiempo real de la ciudad de Cuenca, utilizando servicios estándares OGC. Recuperado de
Wirth, R. (2000). CRISP-DM: Towards a standard process model for data mining. En Proceedings of the Fourth International Conference on the Practical Application of Knowledge Discovery and Data Mining (pp. 29–39).
Zhang, L., Deng, S., and Li, S. (2017). Analysis of power consumer behavior based on the complementation of K-means and DBSCAN. En 2017 IEEE Conference on Energy Internet and Energy System Integration (EI2) (pp. 1-5).



How to Cite

Arce, D., Lima, F., Orellana Cordero, M. P., Ortega, J., Sellers, C., & Ortega, P. (2018). Discovering behavioral patterns among air pollutants: A data mining approach. Enfoque UTE, 9(4), pp. 168 - 179.



Computer Science, ICTs