Overview#

DART Pipeline brings together climate and sociodemographic data as netCDF files suitable for processing or input to machine learning models used for disease incidence forecasting.

Fetching sources: The first stage is fetching sources from online providers, no or minimal processing takes place in this step. It is recommended that the raw sources are archived to enable reproducibly running the pipeline.
Bias correction: Weather data particularly from whole-Earth reanalysis can under- or over-estimate variables such as temperature and precipitation. We include a bias correction workflow for precipitation (for historical data) and precipitation, relative humidity and temperature (for forecast data).
Processing sources: Process data into netCDF files suitable for ingestion into a machine learning model or used for visualisation.

Each of these steps can be performed separately through the dart-pipeline command line interface, and an associated utility dart-bias-correct available separately.

Components#

DART-Pipeline has been developed to be modular and extensible. Common code that interfaces with APIs such as ECMWF’s cdsapi and zonal statistics functions (using the exactextract library) are in a common utility library that can be re-used, called geoglue. The bias correction module (dart-bias-correct) is in a separate repository as its dependency is GPL-3.0 licensed. For orchestrating forecast data fetching and processing along with running containerised models, there is a dart-runner tool. The figure below shows the dependency relationship between these components.

        graph TD
  cdsapi --> geoglue
  xarray --> geoglue
  xarray --> DART-Pipeline
  xarray --> dart-runner
  rasterio --> geoglue
  geoglue --> DART-Pipeline
  geoglue --> dart-bias-correct
  dart-bias-correct --> dart-runner
  DART-Pipeline --> dart-runner

Overview

Contents

Overview#

Components#