High Level Graphs
Dask collections operations are structurally defined as a high level graph of tasks and sub graphs. They can be used for visualizations and high level optimizations.
a = da.arange(10).sum()
graph = a.dask
print(graph)
HighLevelGraph with 3 layers.
<dask.highlevelgraph.HighLevelGraph object at 0x10a9fbe50>
0. arange-c08e9f8bb90c94d3a220f742c88f3f62
1. sum-2f28e54f9d1e0db193618f80e27e339c
2. sum-aggregate-6d9a43e9bf7912f58e31f11e385bdde2
High level graph is made up of python dictionary.
print(a.dask.layers)
{'arange-c08e9f8bb90c94d3a220f742c88f3f62': <dask.highlevelgraph.MaterializedLayer object at 0x106e1aef0>,
'sum-2f28e54f9d1e0db193618f80e27e339c': Blockwise<(('arange-c08e9f8bb90c94d3a220f742c88f3f62', ('.0',)),) -> sum-2f28e54f9d1e0db193618f80e27e339c>,
'sum-aggregate-6d9a43e9bf7912f58e31f11e385bdde2': <dask.highlevelgraph.MaterializedLayer object at 0x10a9fbbb0>}
print(a.dask.dependencies)
{'arange-c08e9f8bb90c94d3a220f742c88f3f62': set(),
'sum-2f28e54f9d1e0db193618f80e27e339c': {
'arange-c08e9f8bb90c94d3a220f742c88f3f62'
},
'sum-aggregate-6d9a43e9bf7912f58e31f11e385bdde2': {
'sum-2f28e54f9d1e0db193618f80e27e339c'
}
}
A high level graph is composed of layers of tasks and dependencies among the layers. In the above example, there are three layers each of having some dependencies.
Dependencies is a set
of keys from the graph. These dependencies create edges between the nodes in the graph.
To get the whole graph, we can use to_dict
method on high level graph.
print(a.dask.to_dict())
{
('arange-c08e9f8bb90c94d3a220f742c88f3f62', 0): (functools.partial(<function arange at 0x1086c9c60>, like=None), 0, 10, 1, 10, dtype('int64')),
('sum-2f28e54f9d1e0db193618f80e27e339c', 0): (subgraph_callable-cc3a9dd45bb3138b6928a8ae6eafc62f, ('arange-c08e9f8bb90c94d3a220f742c88f3f62', 0)),
('sum-aggregate-6d9a43e9bf7912f58e31f11e385bdde2',): (Compose(functools.partial(<function sum at 0x107b1b130>, dtype=dtype('int64'), axis=(0,), keepdims=False), functools.partial(<function _concatenate2 at 0x109d05a20>, axes=[0])), [('sum-2f28e54f9d1e0db193618f80e27e339c', 0)])}
It is to be noted that the sub graphs have been opened up here.
HighLevelGraph
It is an Mapping object consists of layers of sub mappings and dependencies among the layers.
class HighLevelGraph(Mapping):
layers: Dict[str, Mapping]
dependencies: Dict[str, Set[str]]
We can create a HighLevelGraph
ourselves.
def add(x, y):
return x + y
def square(x):
return x ** 2
layers = {
'add': {
('add', 0): (add, 1, 2),
('add', 1): (add, 3, 4)
},
'square': {
('square', 0): (square, ('add', 0)),
('square', 1): (square, ('add', 1))
},
'agg': {
'agg': (add, ('square', 0), ('square', 1))}
}
dependencies = {
'add': set(),
'square': {'add'},
'agg': {'square'}
}
g = HighLevelGraph(layers, dependencies)
get(g, 'agg')
# 58