Xarray with Dask

Xarray is numpy array with metadata and accessing the array with metadata.

Xarray does the computations in memory so has the limitation on data size.

Dask enables xarray to compute with big data with in disk computation like spark.

Dask makes the computation parallel using chunking method.

Xarray fits well with numpy as data array and dask array. Underling array can be a numpy or a dask.

d = xr.DataArray(np.arange(10)) # numpy array
 
b = xr.open_dataset("file.nc", chunks={"dim_0": 1}) # dask array

open_dataset method reads netCDF files.

References

  1. https://stephanhoyer.com/2015/06/11/xray-dask-out-of-core-labeled-arrays/