Dask So Far

  1. Makes dealing with big data in distributed and parallel manner.
  2. Uses underlying numpy arrays for dask arrays.

Note:

For dataframes and other data structures, some different underlying data structure can be there.

  1. All the computations are lazy. Applying computation methods such as mean, max etc won’t get executed immediately until explicitly computation is called. (using compute)
d_array.max().compute()
  1. Creates a task graph for the computation operations like spark.
d_array.visualize()