Dask So Far
- Makes dealing with big data in distributed and parallel manner.
- Uses underlying numpy arrays for dask arrays.
Note:
For dataframes and other data structures, some different underlying data structure can be there.
- All the computations are lazy. Applying computation methods such as mean, max etc won’t get executed immediately until explicitly computation is called. (using
compute
)
d_array.max().compute()
- Creates a task graph for the computation operations like spark.
d_array.visualize()