High Level Graphs

Dask collections operations are structurally defined as a high level graph of tasks and sub graphs. They can be used for visualizations and high level optimizations.

a = da.arange(10).sum()
 
graph = a.dask
print(graph)
 
HighLevelGraph with 3 layers.
<dask.highlevelgraph.HighLevelGraph object at 0x10a9fbe50>
 0. arange-c08e9f8bb90c94d3a220f742c88f3f62
 1. sum-2f28e54f9d1e0db193618f80e27e339c
 2. sum-aggregate-6d9a43e9bf7912f58e31f11e385bdde2

High level graph is made up of python dictionary.

print(a.dask.layers)
 
{'arange-c08e9f8bb90c94d3a220f742c88f3f62': <dask.highlevelgraph.MaterializedLayer object at 0x106e1aef0>,
 'sum-2f28e54f9d1e0db193618f80e27e339c': Blockwise<(('arange-c08e9f8bb90c94d3a220f742c88f3f62', ('.0',)),) -> sum-2f28e54f9d1e0db193618f80e27e339c>, 
 'sum-aggregate-6d9a43e9bf7912f58e31f11e385bdde2': <dask.highlevelgraph.MaterializedLayer object at 0x10a9fbbb0>}
 
 
print(a.dask.dependencies)
 
{'arange-c08e9f8bb90c94d3a220f742c88f3f62': set(),
 'sum-2f28e54f9d1e0db193618f80e27e339c': {
     'arange-c08e9f8bb90c94d3a220f742c88f3f62'
     }, 
 'sum-aggregate-6d9a43e9bf7912f58e31f11e385bdde2': {
     'sum-2f28e54f9d1e0db193618f80e27e339c'
     }
 }

A high level graph is composed of layers of tasks and dependencies among the layers. In the above example, there are three layers each of having some dependencies.

Dependencies is a set of keys from the graph. These dependencies create edges between the nodes in the graph.

To get the whole graph, we can use to_dict method on high level graph.

print(a.dask.to_dict())
 
{
 ('arange-c08e9f8bb90c94d3a220f742c88f3f62', 0): (functools.partial(<function arange at 0x1086c9c60>, like=None), 0, 10, 1, 10, dtype('int64')), 
 
 ('sum-2f28e54f9d1e0db193618f80e27e339c', 0): (subgraph_callable-cc3a9dd45bb3138b6928a8ae6eafc62f, ('arange-c08e9f8bb90c94d3a220f742c88f3f62', 0)), 
 
 ('sum-aggregate-6d9a43e9bf7912f58e31f11e385bdde2',): (Compose(functools.partial(<function sum at 0x107b1b130>, dtype=dtype('int64'), axis=(0,), keepdims=False), functools.partial(<function _concatenate2 at 0x109d05a20>, axes=[0])), [('sum-2f28e54f9d1e0db193618f80e27e339c', 0)])}

It is to be noted that the sub graphs have been opened up here.

HighLevelGraph

It is an Mapping object consists of layers of sub mappings and dependencies among the layers.

class HighLevelGraph(Mapping):
    layers: Dict[str, Mapping]
    dependencies: Dict[str, Set[str]]

We can create a HighLevelGraph ourselves.

def add(x, y):
  return x + y
 
def square(x):
  return x ** 2
 
 
layers = {
  'add': {
    ('add', 0): (add, 1, 2),
    ('add', 1): (add, 3, 4)
  },
  'square': {
    ('square', 0): (square, ('add', 0)),
    ('square', 1): (square, ('add', 1))
  },
  'agg': {
    'agg': (add, ('square', 0), ('square', 1))}
}
 
dependencies = {
  'add': set(),
  'square': {'add'},
  'agg': {'square'}
}
 
g = HighLevelGraph(layers, dependencies)
 
get(g, 'agg')
 
# 58
 

References

  1. https://docs.dask.org/en/stable/high-level-graphs.html