Provenance¶
Every Distribution or Record returned by a workflow function carries
a Provenance record linking it to its inputs and the op that produced
it. The result is a directed acyclic graph: each node is a value, each
edge points from a value to one of its inputs.
provenance_ancestors(value) returns the transitive set of values that
went into producing value. provenance_dag(value) returns the same
information as a dict describing the full DAG — useful for debugging or
for rendering the lineage with graphviz.
Provenance(operation, parents=(), metadata=dict())
dataclass
¶
Tracks how a distribution was created.
to_dict(*, recurse=True)
¶
Serialize to a JSON-compatible dict.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
recurse
|
bool
|
If True, recursively serialize parent provenance chains. If False, only include parent type/name references. |
True
|
Source code in probpipe/core/provenance.py
from_dict(d)
classmethod
¶
Reconstruct from a dict produced by to_dict.
Parent distributions are not available at deserialization time, so
parents will be an empty tuple. The parent information is
preserved in the dict under "parents" for inspection.
Source code in probpipe/core/provenance.py
provenance_ancestors(node)
¶
Return all ancestor nodes reachable via provenance chains.
Traverses node.source.parents recursively (breadth-first) and
returns a flat list of unique ancestors, ordered by discovery.
The input node is not included in the result.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
node
|
Distribution | Record | RecordArray
|
Any object exposing a |
required |
Source code in probpipe/core/provenance.py
provenance_dag(dist)
¶
Build a Graphviz Digraph of the provenance chain rooted at dist.
Each node is a distribution (labelled with type and name). Edges point from parent to child and are labelled with the operation that produced the child.
Requires the graphviz package. Returns a graphviz.Digraph
instance that can be rendered or displayed in a notebook.