Paper published in Sustainability Journal about “Air Quality Monitoring Network Design Optimisation for Robust Land Use Regression Models”

The paper Air Quality Monitoring Network Design Optimisation for Robust Land Use Regression Models (by Shivam Gupta, Edzer Pebesma, Jorge Mateu, Auriol Degbelo) has been published in the special issue Spatial and Spatio-Temporal Planning for Urban Health and Sustainability of Sustainability 2018, 10(5)

Abstract: A very common curb of epidemiological studies for understanding the impact of air pollution on health is the quality of exposure data available. Many epidemiological studies rely on empirical modelling techniques, such as land use regression (LUR), to evaluate ambient air exposure. Previous studies have located monitoring stations in an ad hoc fashion, favouring their placement in traffic “hot spots”, or in areas deemed subjectively to be of interest to land use and population. However, ad-hoc placement of monitoring stations may lead to uninformed decisions for long-term exposure analysis. This paper introduces a systematic approach for identifying the location of air quality monitoring stations. It combines the flexibility of LUR with the ability to put weights on priority areas such as highly-populated regions, to minimise the spatial mean predictor error. Testing the approach over the study area has shown that it leads to a significant drop of the mean prediction error (99.87% without spatial weights; 99.94% with spatial weights in the study area). The results of this work can guide the selection of sites while expanding or creating air quality monitoring networks for robust LUR estimations with minimal prediction errors.

According to United Nations estimates, 66% of the total world population is expected to be living in the urban spaces by 2050. At the same time, the Organisation for Economic Co-operation and Development (OECD) projects that by 2050 air pollution will be the top environmental cause of mortality worldwide. GIS and spatial analysis have increasingly become an essential tool for air pollution monitoring. Interpolation of pollution data collected by regulatory air quality monitoring stations can help in regional patterns, but the air quality monitoring networks are very sparsely arranged to collect informed data at a city level. Land Use Regression (LUR) models are helpful to take into account air pollution variability within the cities. LUR models are a promising alternative to these conventional approaches as they establish the relationship between easily accessible land use characteristics and pollutant measurement. Our knowledge of air pollution monitoring is mostly based on limited data. The published paper takes a new look at Monitoring Network Design (MND) using a new optimisation method. The proposed method identifies the combination of locations which minimise the spatial mean prediction error over the entire study area for two contexts: (1) without using any weighted function; and (2) with a spatial population weighted function for high population density areas. The optimisation method does not rely on monitoring station data for monitoring site placement, thus giving independence for planning and readjustments of the optimal air quality MND for the cities with no or insignificant amount of air quality data. Hence, the proposed method can be a helpful tool in air quality MND that enables LUR estimations with fewer errors for preventing air pollution exposure and advancing urban health sustainability.

For more detail information, please access the article from here.

The article is Open Access and is funded by European Commission within the Marie Skłodowska-Curie Actions, International Training Networks (ITN), European Joint Doctorates (EJD). The funding period is January 1, 2015 – December 31, 2018, Grant Agreement number 642332 — GEO-C — H2020-MSCA-ITN-2014.

Paper published in Statistics & Probability Letters Journal concerning “Quality of life, big data and the power of statistics”

The paper Quality of life, big data and the power of statistics (by Shivam Gupta, Jorge Mateu, Auriol Degbelo, Edzer Pebesma) has been published in Special issue dedicated to Statistics and Big Data of journal  Statistics & Probability Letters Volume 136 – May 2018

Abstract: The digital era has opened up new possibilities for data-driven research. This paper discusses big data challenges in environmental monitoring and reflects on the use of statistical methods in tackling these challenges for improving the quality of life in cities.

With an increasing number of people moving in (and to) urban areas, there is an urgent need of examining what this rising number means for the environment and QoL in cities. Air quality has an effect on the population’s QoL (Darçın, 2014), which is also the major environmental risk factor for health. Data for environmental and meteorological analysis are not only of a significant volume but are also complex in space and time. Formats and types of data are also very diverse (e.g., netCDF, GDB, CSV, GeoTIFF, shapefile, JSON, etc.), and many interconnections prevail within data, which make it complicated for traditional data analysis procedures. As Scott (2017) said, statistics remains highly relevant irrespective of ‘bigness’ of data. It provides the basis to make data speak while taking into account the inherent uncertainties. Statistical analysis involves developing data collection procedures to further handle different data sources and to propose formal models for analysis and predictions.

In the published paper we focused on the role of statistics in handling the five Vs (Volume, Velocity Variety, Veracity and Value) of big data, and the challenges posed.  We proposed to combine two well-established statistical methods to optimise the selection of variables and locations for spatial and temporal analysis of environmental data sources (with more focus on air quality monitoring). The combined use of both methods; Land Use Regression (LUR) and Spatial Simulated Annealing (SSA), proposed in the paper will help in designing data acquisition processes so that the maximum information can be extracted given a specific number of possible measurement sites. Limiting the data sources can increase the speed of the analysis. Hence, making big data analysis more effective regardless of the “bigness”.

For more detail information, please access the article from : https://www.sciencedirect.com/science/article/pii/S0167715218300750

The article is Open Access and is funded by European Commission within the Marie Skłodowska-Curie Actions, International Training Networks (ITN), European Joint Doctorates (EJD). The funding period is January 1, 2015 – December 31, 2018, Grant Agreement number 642332 — GEO-C — H2020-MSCA-ITN-2014.