Abstract: |
According to the report on global health risks, published by World Health Organization, environmental issues are urged to be dealt with in the world. Especially, air pollution causes great damage to human health. In this work, we build a framework for finding the correlations between air pollution and cancer diseases. This framework consists of a data access flow and a data analytics flow. The data access flow is designed to process raw data and to make the data able to be accessed by APIs. The cancer statistics is then mapped to air pollution data through temporal and spatial information. The analytics flow is used to find insights, based on the data exploration and data classification methods. The data exploration methods use statistics, clustering, and a series of mining techniques to interpret data. Then, the data mining methods are applied to find the relationships between air quality and cancer diseases by viewing air pollution indicators and cancer statistics as features and labels, respectively. The experiment results show that NO and NO2 air pollutants have a significant influence on the breast cancer, and the lung cancer is significantly influenced by NO2, NO, PM10 and O3, which are consistent with those from traditional statistical methods. Moreover, our results also cover the research results from several other studies. The proposed framework is flexible and can be applied to other applications with spatiotemporal data. |