Empirical and bootstrap¶
Finite-sample distributions backed by stored draws or by an underlying
data source. .n reports the count; .samples, .draws(), and
.components access the stored items.
EmpiricalDistribution(samples, weights=None, *, log_weights=None, name=None)
¶
Bases: Distribution[T], SupportsSampling, SupportsExpectation
Weighted empirical distribution over a finite set of samples.
This is the generic base. Concrete sample types T (objects,
callables, opaque user values, ...) are stored in a numpy object
array.
Automatic Record dispatch: EmpiricalDistribution(samples,
...) returns a RecordEmpiricalDistribution when
samplesis aRecord(each field stacked along axis 0),- or
samplesis a numeric JAX/numpy array andname=...is passed (the array auto-wraps as a single-fieldRecord({name: arr})).
Otherwise, the generic base is returned and stores samples as a
numpy object array.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
samples
|
Record | sequence of T | array-like
|
The support points. Numeric-array inputs require |
required |
weights
|
array-like, :class:`~probpipe.Weights`, or None
|
Non-negative weights (normalised internally). Mutually exclusive with log_weights. Uniform when neither is given. |
None
|
log_weights
|
array-like, :class:`~probpipe.Weights`, or None
|
Log-unnormalised weights. Mutually exclusive with weights. |
None
|
name
|
str
|
Distribution name. Mandatory when samples is a bare numeric array. |
None
|
Source code in probpipe/core/_empirical.py
n
property
¶
Number of samples.
samples
property
¶
Stored samples.
is_uniform
property
¶
True when all samples have equal weight.
weights
property
¶
Normalised weights, shape (n,).
log_weights
property
¶
Normalised log-weights, shape (n,).
effective_sample_size
property
¶
Kish's effective sample size (ESS).
RecordEmpiricalDistribution(samples, weights=None, *, log_weights=None, sample_shape=None, name=None)
¶
Bases: EmpiricalDistribution[Record], NumericRecordDistribution, SupportsMean, SupportsVariance, SupportsCovariance
Empirical distribution over Record-structured numeric samples.
Each sample is a row of the stored Record: if the data has fields
X of shape (n, p) and y of shape (n,), then a single
draw is Record(X=array(p,), y=scalar). Joint row indexing
preserves per-observation correlation across fields during sampling
and resampling.
A bare numeric array auto-wraps as a single-field Record keyed
by name — that is the migration path for the previous
NumericEmpiricalDistribution(arr) form. The auto-wrap requires
name= so the field's identity is unambiguous downstream.
Inherits NumericRecordDistribution shape semantics
(record_template, event_shapes, event_size,
batch_shape) plus exact weighted moments
(mean, variance, cov) over each field.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
samples
|
Record | array - like
|
Sample data. A Record's fields each stack along axis 0; a
numeric array auto-wraps as |
required |
weights
|
array-like, :class:`~probpipe.Weights`, or None
|
Non-negative weights (normalised internally). Mutually exclusive with log_weights. |
None
|
log_weights
|
array-like, :class:`~probpipe.Weights`, or None
|
Log-unnormalised weights. Mutually exclusive with weights. |
None
|
sample_shape
|
tuple of int
|
Only valid for numeric-array auto-wrap: leading-axis sample shape; trailing axes form the field's event shape. |
None
|
name
|
str
|
Distribution name. Required when samples is a numeric array (used as the auto-wrapped field name). |
None
|
Notes
Construction calls Distribution.__init__ directly rather than
chaining through super().__init__(). The reason: the generic
EmpiricalDistribution[T] base stores samples as a flat numpy
object array (self._samples), which is incompatible with the
Record-structured layout this subclass uses (self._record_data).
Subclasses that further specialise this class (e.g.
ApproximateDistribution) should likewise
call RecordEmpiricalDistribution.__init__ rather than
super().__init__ if they need to skip the generic-base storage
path.
Source code in probpipe/core/_empirical.py
samples
property
¶
Stored stacked-sample data as a structured NumericRecord.
Use self.samples[field_name] for per-field array access. For
a flat (n, dim) matrix view across all fields, use
flat_samples instead.
flat_samples
property
¶
Flat (n, dim) view across all fields, in insertion order.
dim = sum_over_fields(prod(event_shape_f)). Multi-dim event
shapes are flattened row-major; field order matches
fields. Use samples for the structured per-field
view.
Examples:
Single-field auto-wrap with a 1-D event::
EmpiricalDistribution(jnp.zeros((100, 5)), name="theta").flat_samples.shape
# (100, 5)
Multi-field posterior::
posterior = ApproximateDistribution(...) # mu, log_sigma fields
posterior.flat_samples.shape # (n, 2)
posterior.flat_samples.mean(axis=0) # per-parameter posterior mean
event_shape
property
¶
Per-sample event shape, single-field only.
For a single-field record (the auto-wrap case from
EmpiricalDistribution(arr, name=...)), returns the field's
event shape — i.e. arr.shape[1:].
For multi-field records, raises AttributeError rather
than returning (): a silent scalar fallback would let
callers that aren't multi-field-aware mis-classify a
structured posterior as a scalar event. Use
event_shapes (plural, dict-valued) for the multi-field
case.
See Also
event_shapes — the per-field dict, always available.
RecordBootstrapReplicateDistribution.obs_shape — the
symmetric single-field-only / multi-field-raises accessor
for bootstrap replicates' per-observation event shape.
Raises:
| Type | Description |
|---|---|
AttributeError
|
If |
event_shapes
property
¶
Per-field event shapes (sample axis stripped).
Always available, including for single-field records. Compare
event_shape (singular), which is single-field-only and
raises on multi-field.
dim
property
¶
Flat dimensionality of a single Record draw.
KDEDistribution(samples, weights=None, *, log_weights=None, bandwidth=None, name=None)
¶
Bases: TFPDistribution
Gaussian kernel density estimate as a ProbPipe distribution.
Wraps a TFP MixtureSameFamily(Categorical, MultivariateNormalDiag)
to provide a smooth density approximation from a set of weighted
samples. Inherits all protocol implementations from
TFPDistribution.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
samples
|
array - like
|
Sample matrix of shape |
required |
weights
|
array-like, :class:`~probpipe.Weights`, or None
|
Non-negative weights. A pre-built |
None
|
log_weights
|
array-like, :class:`~probpipe.Weights`, or None
|
Log-unnormalized weights. A pre-built |
None
|
bandwidth
|
array - like or None
|
Per-dimension bandwidth (standard deviation of each Gaussian
kernel), shape |
None
|
name
|
str or None
|
Distribution name for provenance. |
None
|
Source code in probpipe/distributions/kde.py
n
property
¶
Number of kernel centres (samples).
BootstrapDistribution(evaluations, *, weights=None, log_weights=None, name=None)
¶
Bases: NumericRecordDistribution, SupportsSampling, SupportsMean, SupportsVariance
Distribution over bootstrap-resampled means of a statistic.
Given n evaluations f(x_1), ..., f(x_n) where x_i ~ P,
this represents the sampling distribution of the sample mean
(1/n) sum f(x_i), capturing Monte Carlo error.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
evaluations
|
array-like, shape ``(n, *stat_shape)``
|
The individual |
required |
weights
|
array-like, :class:`~probpipe.Weights`, or None
|
Non-negative weights (normalized internally). A pre-built
|
None
|
log_weights
|
array-like, :class:`~probpipe.Weights`, or None
|
Log-unnormalized weights. A pre-built |
None
|
name
|
str
|
Distribution name. |
None
|
Source code in probpipe/core/_numeric_record_distribution.py
BootstrapReplicateDistribution(source, *, n=None, name=None)
¶
Bases: Distribution[T], SupportsSampling, SupportsExpectation
N-fold product of an empirical distribution (bootstrap resampling).
Each draw from this distribution is a bootstrapped dataset — n
observations drawn i.i.d. (with replacement) from the source.
Source dispatch:
Record/RecordEmpiricalDistribution/ numeric array / numeric-array-backedEmpiricalDistribution→ returns aRecordBootstrapReplicateDistribution. The numeric array path requiresname=(single-field auto-wrap).- Any
SupportsSamplingsource (e.g.Normal, a customDistribution) → stays in the generic base.nis mandatory because no canonical observation count exists; each replicate isni.i.d. draws fromsource._sample. - Any other sequence → generic base, equally weighted, with object-array storage.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
source
|
Record | EmpiricalDistribution | SupportsSampling | sequence
|
Data to bootstrap from. |
required |
n
|
int or None
|
Number of observations per bootstrap dataset. Required when
|
None
|
name
|
str or None
|
Distribution name. Mandatory when |
None
|
Source code in probpipe/core/_empirical.py
n
property
¶
Observations per bootstrap dataset.
source_n
property
¶
Number of source observations, or None for a sampleable source.
data
property
¶
Source data (None for a sampleable source).
weights
property
¶
Source weights (None for a sampleable source).
is_uniform
property
¶
True when source observations are equally weighted.
RecordBootstrapReplicateDistribution(source, *, n=None, name=None)
¶
Bases: BootstrapReplicateDistribution[Record], NumericRecordDistribution
Bootstrap replicate distribution over Record-structured data.
Each sample is a full bootstrapped dataset: n rows drawn i.i.d.
with replacement from the source data, with the same row indices
applied jointly across fields.
Inherits NumericRecordDistribution shape semantics
(record_template, event_shapes, ...). A bare numeric array
source auto-wraps as a single-field Record keyed by name —
matching the migration path for the previous
ArrayBootstrapReplicateDistribution(arr) form.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
source
|
Record | RecordEmpiricalDistribution | array - like
|
Data to bootstrap from. A bare numeric array auto-wraps as a
single-field |
required |
n
|
int or None
|
Observations per bootstrap dataset. Defaults to the source's observation count. |
None
|
name
|
str or None
|
Distribution name. Mandatory when source is a bare numeric array (used as the single-field auto-wrap field name). |
None
|
Raises:
| Type | Description |
|---|---|
TypeError
|
If source is a generic |
Source code in probpipe/core/_empirical.py
1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 | |
event_shapes
property
¶
Per-field replicate event shapes (n, *obs_event_shape).
event_shape
property
¶
Replicate event shape, single-field only.
For a single-field replicate, returns
(n, *per_observation_event_shape) — i.e. the shape of one
bootstrap dataset. Multi-field replicates raise
AttributeError rather than returning () so silent
scalar-fallback bugs don't slip through. Use
event_shapes (plural, per-field) for the multi-field
case, or obs_shape for the per-observation shape on
single-field replicates.
See Also
event_shapes — the per-field dict, always available.
obs_shape — the per-observation event shape (replicate
axis stripped) for single-field replicates.
RecordEmpiricalDistribution.event_shape — the
symmetric single-field-only / multi-field-raises accessor
on the empirical-distribution side.
Raises:
| Type | Description |
|---|---|
AttributeError
|
If |
obs_shape
property
¶
Per-observation event shape, single-field only.
For a single-field replicate, returns the per-observation
event shape (the field's shape with the sample axis stripped).
Multi-field replicates raise AttributeError rather
than returning (); use obs_shapes (plural,
per-field) for the multi-field case.
See Also
obs_shapes — the per-field dict, always available.
event_shape — the full replicate event shape
(n, *obs_shape) for single-field replicates.
Raises:
| Type | Description |
|---|---|
AttributeError
|
If |
obs_shapes
property
¶
Per-field observation event shapes (replicate axis stripped).
dim
property
¶
Flat dimensionality of a single bootstrap dataset.
Sum across fields of n * max(1, prod(obs_event_shape)).
JointEmpirical(*, weights=None, log_weights=None, name=None, **samples)
¶
Bases: RecordDistribution, SupportsSampling, SupportsConditioning
Joint distribution from weighted joint samples.
Stores per-component sample arrays (all with the same number of rows) and optional weights. Sampling resamples rows jointly, preserving correlation between components.
Dynamic dispatch via __new__: when every field is a numeric
array (numpy, JAX, or numeric scalar), constructing JointEmpirical
returns a NumericJointEmpirical instance, which additionally
supports mean and variance. Fall through to this base class for
mixed / opaque data (e.g. object-dtype arrays of labels).
When used in broadcasting enumeration, the joint is treated as a single
unit with n samples (no cartesian decomposition).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
weights
|
array-like, :class:`~probpipe.Weights`, or None
|
Non-negative sample weights (normalized internally). A pre-built
|
None
|
log_weights
|
array-like, :class:`~probpipe.Weights`, or None
|
Log-unnormalized sample weights. A pre-built
|
None
|
name
|
str
|
Distribution name. |
None
|
**samples
|
array - like
|
Named component sample arrays. Each must have the same number of
rows (first dimension = |
{}
|
Source code in probpipe/distributions/_joint_empirical.py
NumericJointEmpirical(*, weights=None, log_weights=None, name=None, **samples)
¶
Bases: JointEmpirical, SupportsMean, SupportsVariance
Joint empirical where every field is a numeric array.
Subclass of JointEmpirical that additionally implements
SupportsMean and
SupportsVariance. For a density
on top of empirical samples use the converter registry
(from_distribution(emp, KDEDistribution, ...)) or fit a
parametric distribution.
Construction coerces every field to a floating-point JAX array
(preserving float64 when JAX's x64 mode is enabled, otherwise
promoting integer inputs to float32); fields that aren't
numeric arrays raise TypeError. Typically constructed via
JointEmpirical, which dispatches here automatically when all
fields are numeric.
Source code in probpipe/distributions/_joint_empirical.py
event_shapes
property
¶
Per-component event shapes.