Climate data validation and reconstruction

Created by: Matteo Gentilucci
Revised by: Rui Liu

Validation and reconstruction of climate data is very important to provide reliable climate analyses. Sometimes in the climate time series there are many types of errors and missing data, procedures are therefore needed to overcome these difficulties. Validation is based on 5 different data quality checks: gross error checking, internal consistency check, tolerance test, temporal consistency and spatial consistency. Instead, the reconstruction of the data was carried out using geostatistical methods and in the absence of a reference time series.

Table of Content [Hide]

Gross error checking: each day or monthly data outside the established threshold was deleted. The threshold depends on the field of existence relative to the climate of the area considered.

Internal consistency check: when a climatic parameter cannot assume a certain value because it is impossible. For example maximum temperature lower than minimum temperature, or negative precipitation.

Tolerance test: each weather station is investigated on the basis of its historical time series. The test gives an upper and a lower threshold by adding or subtracting to the mean of the historical time series the standard deviation (of the same time series) multiplied by three. The values that are outside from thresholds are considered under investigation.

Temporal consistency: analyzes the contiguous days in order to detect persistencies (same data for several days) or incorrect data. The persistencies must be deleted from analysis when the same data is repeated for at least two days.

Spatial consistency: detects errors based on the values of the neighboring weather stations. First of all is calculated the coefficient of determination of neighbouring weather station with the candidate weather station; if the value is above 0.7 the neighbouring weather station can be used to analyze the candidate one. If the value of the candidate weather station exceed the threshold value of the standard deviation multiplied by three added to the mean, then the data must be deleted.

The processes are ruled by flags that define the order of the controls to be performed (see image).

The pilot project was developed in a small area of central Italy (Macerata province) on about 118 rain gauges and 40 weather stations for temperature from 1931 to 2014. The time interval was divided into 3 standard periods (1931-1960/1961-1990/1991-2014) according to WMO requirement. The result of the analysis is represented by the elimination of 0.02% of total temperature records and the 1.67% of precipitation one.

Recostruction was performed day by day with empirical bayesian kriging (EBK) after a comparison with inverse distance weighted and co-kriging.

The inverse distance weighted is a too imprecise method, while co-kriging is more precise than EBK but much slower in the execution.

Therefore since the reconstruction is daily for temperatures and monthly for precipitations is the ebk was considered the optimal method (table is an example of comparison between three interpolation methods for reconstruction of daily temperatures, in Macerata province, Central Italy).

  IDW EBK CO-KRIGING
REGRESSION FUNCTION 0.6221x+6.8366 0.6813x+5.7113 0.9400x+1.2166
MEAN 0.0119 0.0311 0.0566
ROOT-MEAN SQUARE 1.6870 1.6429 1.2465
MEAN STANDARDIZED   -0.0002 0.0237
ROOT MEAN SQUARE STANDARDIZED   0.9514 0.9890
AVERAGE STANDARD ERROR   1.7366 1.5278