“ElbeRiver” by www.rubenholthuijsen.nl is licensed under CC BY 2.0, modified ↩

Team logo (source¹)

At the 2019 Deep Learning Hackathon in Dresden, a team led by Lennart Schmidt (Helmholtz-Zentrum für Umweltforschung Leipzig) proposed to use Deep Learning in order to predict streamflow of the Elbe river in Germany, focussing on the prediction of flood events. Lennart’s team was voted to be one of the most accurrate at the hackathon. This blog post tries to give you an overview of what the team achieved.

The team consisted of:

Lennart Schmidt (Helmholtz-Centre for Environmental Research, Leipzig)
Elona Gusho
Kira Vinogradova (Max Planck Institute of Molecular Cell Biology and Genetics, Dresden) (Mentor)
Walter de Back (TU Dresden) (Mentor)

The idea

Our goal was to predict streamflow of the Elbe river close to its delta to the north sea in Germany from raster datasets of interpolated observed precipitation and air temperature using a ConvLSTM structure.

Catchment Map (source²)

The project is of significance from three perspectives: For one, the Elbe has been subject to hydrological extremes in the recent past like severe floodings in 2002 and 2013. Thus, new and promising forecasting methods for streamflow are of high importance to all societal and economic activites that are related to the Elbe, e.g. insurance plans, inland water navigation and agriculture. Secondly, the application is of relevance from a modelling perspective: For streamflow forecasting, LSTMs have been applied successfully using 1D-input data, i.e. timeseries of catchment averages of precipitation and temperature. One aim of our project was to evaluate inhowfar prediction accuracy improves by including the spatial structure of inputs into our model architecture. The results were then compared to a renowned spatially-explicit physical model. Lastly, our application can advance hydrological system understanding: Researchers have been occupied with identifying flood-inducing patterns of precipitation as different areas of a river catchment react differently to precipitation due to differences in land-cover, soil types, geology etc. Using approaches like “GradCAM” and “DeepSHAP” we aim to derive saliency maps to identify spatial patterns of precipitation and temperature that produce flood events.

The data

Data

Our input data consisted of interpolated rasters of daily mean precipitation and daily mean temperature for the time period 1950-2016. The data was derived from the publicly-available E-OBS-dataset³, i.e. masked to the Elbe catchment. Spatial resolution was 0.1x0.1°, i.e. about 6x10km. Thus, each raster contained 56 x 66 cells for each of the 24472 timesteps. The target data, daily mean streamflow at the gauge “Neu-Darchau” close to Hamburg, was obtained from the Global Runoff Database⁴. For comparison, we had corresponding predicted time series of a renowned spatially-distributed physical model.

Hackathon Experiences

We were very lucky to be assigned to our mentor Walter de Back⁵, who prepared quite a few jupyter notebooks in advance - This helped a lot for the rest of the team to quickly comprehend the necessary code. Also, our data was clean right from the start, so we were ready to go from day one (We strongly suggest future teams to prepare themselves similarly, as 5 days go by very quickly!). During the week, our main focus was to identify the best model set-up, so we explored the ideal length of the input sequences, different loss functions and varied the model architecture. While working on the 2D-ConvLSTM approach, we also implemented a 1D-LSTM, i.e. using timeseries of catchment averages of precipitation and temperature, to serve as a baseline.

In general, our team was fairly heterogeneous as to the member`s backgrounds: Walter is the one with a strong background in (medical) Deep Learning, Elona is very proficient in data analysis, Lennart is the one to provide the hydrological knowledge and Kira is very experienced in the field of Interpretable Deep Learning. This way, each of us looked at methodological decisions from a different perspective, which turned out to be both very educational and effective: It happened multiple times that presumably ignorant questions resulted in methodological improvements that might have been overlooked otherwise.

The hackathon was well-organized, which allowed us to focus entirely on our project - most importantly, there was always coffee :) In addition to this, there was a lively atmosphere so that we got in touch with other teams/mentors throughout the whole week. The daily scrum sessions helped in defining and achieving daily goals.

For us, the only thing that did not work out so well was the cooperation on code level: Instead of using git-branches, all of us worked on the same files which produced quite some unnecessary overhead - Future teams, take this as an advice!

Results

Our ConvLSTM-model delivered high prediction accuracy with a concordance correlation coefficient of 0.88. More importantly, it reproduced the dynamics of streamflow very well. When comparing our predictions to the ones of the physical model, we saw that our Deep-Learning approach is in fact slightly superior. Considering the fact that we included only precipitation and temperature, so without adding any other relevant information like topography, land-use or soil characteristics, these results are very impressive. Towards the end of the week, we even managed to implement an add-on that is vital when predicting timeseries: Using a Bayesian Deep Learning Approach⁶, we were able to produce not only predictions but also give estimates of aleatoric and epistemic uncertainties along with the predictions (see image).

Data

One thing that was very suprising to us is the fact that our baseline model, the 1D-LSTM, performed almost as well as the 2d-ConvLSTM approach. This goes along with similar reports from physical hydrological modeling but it is still interesting as it basically suggests that the model only uses a fraction of the spatial information at hand. We will now investigate whether this is the result of uncertainty in our inputs or whether there is still more potential in tuning the algorithm. Also, we did not succeed in producing saliency maps for flood events during the hackathon, so that is something we will look into in the near future.

Overall, it is safe to say that, besides accuracies, our personal goals were fully achieved: On our way home from the hackathon, all of us carried a big bundle of new knowledge, friends and motivation for future projects. So we would like to express big thanks to all organizers, mentors, sponsors and supporters for making this memorable experience possible!

This article was written by Lennart Schmidt (UFZ) and edited by Peter Steinbach (HZDR).

“ElbeRiver” by www.rubenholthuijsen.nl is licensed under CC BY 2.0, modified ↩
“Elbe Einzugsgebiet” by NordNordWest is licensed under CC BY-SA 3.0 ↩
https://www.ecad.eu/download/ensembles/download.php ↩
https://www.bafg.de/GRDC/EN/01_GRDC/13_dtbse/database_node.html ↩
https://twitter.com/wdeback?lang=en ↩
https://arxiv.org/abs/1703.04977 ↩

Share on

Twitter Facebook LinkedIn

Predicting streamflow with LSTMs

The idea

The data

Hackathon Experiences

Results

Share on

You may also enjoy

MLC Seminar introducing kernel methods and Gaussian processes

MLC XMAS Event - MLC meets ScaDS.AI Living Lab

MLC Seminar on PyRCN

MLC Seminar on Prediction of designer-recombinases for DNA editing with generative deep learning