Click here to flash read.
Knowing the actual precipitation in space and time is critical in
hydrological modelling applications, yet the spatial coverage with rain gauge
stations is limited due to economic constraints. Gridded satellite
precipitation datasets offer an alternative option for estimating the actual
precipitation by covering uniformly large areas, albeit related estimates are
not accurate. To improve precipitation estimates, machine learning is applied
to merge rain gauge-based measurements and gridded satellite precipitation
products. In this context, observed precipitation plays the role of the
dependent variable, while satellite data play the role of predictor variables.
Random forests is the dominant machine learning algorithm in relevant
applications. In those spatial predictions settings, point predictions (mostly
the mean or the median of the conditional distribution) of the dependent
variable are issued. The aim of the manuscript is to solve the problem of
probabilistic prediction of precipitation with an emphasis on extreme quantiles
in spatial interpolation settings. Here we propose, issuing probabilistic
spatial predictions of precipitation using Light Gradient Boosting Machine
(LightGBM). LightGBM is a boosting algorithm, highlighted by prize-winning
entries in prediction and forecasting competitions. To assess LightGBM, we
contribute a large-scale application that includes merging daily precipitation
measurements in contiguous US with PERSIANN and GPM-IMERG satellite
precipitation data. We focus on extreme quantiles of the probability
distribution of the dependent variable, where LightGBM outperforms quantile
regression forests (QRF, a variant of random forests) in terms of quantile
score at extreme quantiles. Our study offers understanding of probabilistic
predictions in spatial settings using machine learning.
No creative common's license