Baseline Matching and Diagnostics Assessment
Purpose
The Baseline Matching and Diagnostics Assessment tool applies the Spatially Explicit Matched Dynamic Baseline (SEMDB) approach to prepare a defensible baseline package before Carbon Verification Audit by screening donor candidates, validating matching quality, and emitting trajectory plus uncertainty artifacts for audit preparation.
Typical Questions This Tool Helps Answer
- Is the donor pool sufficiently comparable to project units before verification packaging?
- Do matching diagnostics pass configured SMD and trend-consistency thresholds?
- What baseline trajectory and uncertainty outputs should be carried into downstream carbon audit reporting?
Background
The Baseline Matching and Diagnostics Assessment tool is an upstream baseline-defensibility workflow, not a replacement for downstream verification packaging. It helps teams reduce audit risk by enforcing donor eligibility and comparability diagnostics before claim interpretation.
This workflow is most useful when regulator or verifier scrutiny requires explicit evidence that baseline assumptions are stable and reproducible.
Inputs
| Parameter | Type | Required | Description |
|---|---|---|---|
| project_boundary | Vector | Yes | Project boundary polygons for treated units. |
| ecoregion_layer | Vector | Yes | Ecoregion scope for donor eligibility filtering. |
| year0_landcover | Raster | Yes | Baseline land-cover eligibility layer. This is the spatial reference grid — all raster inputs are harmonized to its CRS, cell size, and extent; road and tenure vectors are rasterised onto the same grid. |
| pre_period_ndvi_stack | Raster (multiband) | Yes | Multi-year pre-period NDVI stack. Bands must be in chronological order, oldest first (e.g., band 1 = year −5, band 2 = year −4, …, band 5 = year −1). The per-cell mean across all bands is used as a matching covariate. |
| elevation | Raster | Yes | Elevation covariate used in matching (metres). Slope is derived automatically via Horn's 8-neighbour algorithm. |
| precipitation | Raster | Yes | Precipitation covariate used in matching. |
| soil_organic_carbon | Raster | Yes | SOC covariate used in matching. |
| soil_texture_class | Raster | Yes | Soil texture class constraint covariate. |
| roads | Vector | Yes | Road/accessibility vector. All geometry types accepted (lines, polygons, points). The tool rasterises the layer onto the year0_landcover grid and computes per-cell distance to nearest road in metres via Euclidean distance transform. |
| tenure_status | Vector | No | Optional tenure/legal-status polygon layer. Requires tenure_field when provided. |
| tenure_field | String | No (required with tenure_status) | Attribute field name in tenure_status carrying the legal-status class (e.g., "legal_status", "TENURE_TYPE"). The tool enforces exact equality of this value between treated and donor pixels. |
Parameters
- donor_pool_exclusion_distance_km (optional): leakage-control exclusion distance from project boundary in km; default
5.0. - matching_mode (optional):
mahalanobisorpropensity_score; defaultmahalanobis. - neighbors (optional): nearest-neighbor count per treated cell; default
1. - with_replacement (optional): donor reuse toggle; default
true. - tenure_field (required when
tenure_statusis provided): attribute field name in the tenure layer carrying the legal-status class value. - calliper_elevation_m (optional): hard calliper for elevation in absolute metres (SEMDB spec default: 200 m).
- calliper_slope_deg (optional): hard calliper for slope in absolute degrees (SEMDB spec default: 10°).
- calliper_soc_pct (optional): hard calliper for SOC as a percentage of the project-area mean (SEMDB spec default: 10%).
- calliper_ndvi_pct (optional): hard calliper for mean pre-period NDVI as a percentage of the project-area mean (SEMDB spec default: 10%).
- calliper_road_distance_km (optional): hard calliper for road distance in absolute kilometres (SEMDB spec default: 1 km).
- exact_match_soil_texture_class (optional): enforce exact class matching; default
true. - random_seed (optional): deterministic seed for reproducibility; default
0. - smd_threshold (optional): maximum accepted SMD across all covariates; default
0.1. - parallel_trend_tolerance (optional): trend-slope tolerance; default
0.01. - pre_period_min_years (optional): minimum pre-period depth; default
5. - output_prefix (required): output basename for all artifacts.
Matching Covariates
Six covariates are used in Mahalanobis distance matching. A separate SMD is reported for each in smd_diagnostics:
| Covariate | Source | SEMDB Calliper |
|---|---|---|
| Elevation (m) | elevation raster | calliper_elevation_m (200 m) |
| Precipitation | precipitation raster | Scoped by ecoregion |
| Soil Organic Carbon | soil_organic_carbon raster | calliper_soc_pct (10% of project mean) |
| Distance to Roads (m) | roads vector → rasterised + Euclidean DT | calliper_road_distance_km (1 km) |
| Slope (°) | Derived from elevation | calliper_slope_deg (10°) |
| Mean Pre-Period NDVI | Mean across pre_period_ndvi_stack bands | calliper_ndvi_pct (10% of project mean) |
NDVI Stack Band Ordering
Bands in pre_period_ndvi_stack must be in chronological order, oldest first (band 1 = year −5, band 5 = year −1). There is no embedded year labelling — the tool uses band position. To create the stack:
gdal_merge.py -separate -o ndvi_stack.tif ndvi_y-5.tif ndvi_y-4.tif ndvi_y-3.tif ndvi_y-2.tif ndvi_y-1.tif
In QGIS: Raster → Miscellaneous → Merge, enable "Place each input file into a separate band", add files chronologically.
Spatial Alignment
All inputs are aligned to the year0_landcover grid before any computation:
- Raster inputs are reprojected and resampled to match
year0_landcoverCRS, cell size, and extent. - Vector inputs (
roads,tenure_status,project_boundary,ecoregion_layer) are reprojected to theyear0_landcoverCRS if needed. - Road and tenure vectors are rasterised directly onto the
year0_landcovergrid — you do not need to pre-rasterise them.
Outputs
| Artifact | Runtime Output Key | Type | Description |
|---|---|---|---|
| Donor candidate mask | donor_candidate_mask | Raster | Donor eligibility output after filtering. |
| Matching assignments preview | matching_assignments_preview | CSV | Treated-control assignment preview. |
| Matching preview summary | matching_preview | JSON | Matching configuration and preview diagnostics. |
| SMD diagnostics | smd_diagnostics | CSV | Covariate balance diagnostics and threshold checks. |
| Parallel trends diagnostics | parallel_trends | CSV | Pre-period trend consistency diagnostics. |
| Baseline trajectory | baseline_trajectory | CSV | Baseline trajectory output for downstream workflows. |
| Baseline uncertainty | baseline_uncertainty | CSV | Baseline uncertainty envelope output. |
| Baseline trajectory summary | baseline_trajectory_summary | JSON | Machine-readable trajectory summary. |
| Baseline spatial preview | baseline_spatial_preview | Raster | Spatial preview of baseline signal behavior. |
| Preflight report | preflight | JSON | Input and rule-check readiness output. |
| Workflow summary contract | summary | JSON | Main SEMDB output contract for handoff. |
QA and Acceptance Criteria
Minimum acceptance before downstream Carbon Verification Audit:
- Required inputs validated and aligned.
- Donor pool remains operationally adequate after filters.
- SMD diagnostics pass agreed threshold policy.
- Parallel trend diagnostics pass agreed tolerance policy.
- Baseline trajectory/uncertainty artifacts are complete.
Troubleshooting
- Sparse donor pool: review exclusion distance and eligibility constraints.
- SMD failures: tighten callipers and inspect covariate outliers.
- Trend failures: increase pre-period depth and review NDVI stack quality.
- Reproducibility mismatch: lock
random_seedand verify unchanged inputs.
For additional operator support, share these artifacts with Whitebox support:
summarypreflightsmd_diagnosticsparallel_trends
Example
import whitebox_workflows as wbw
env = wbw.WbEnvironment(include_pro=True, tier="pro")
result = env.baseline_matching_and_diagnostics_assessment(
project_boundary="data/project_boundary.gpkg",
ecoregion_layer="data/ecoregions.gpkg",
year0_landcover="data/landcover_y0.tif",
pre_period_ndvi_stack="data/ndvi_pre_stack.tif", # bands in order: yr-5, yr-4, yr-3, yr-2, yr-1
elevation="data/elevation.tif",
precipitation="data/precip.tif",
soil_organic_carbon="data/soc.tif",
soil_texture_class="data/soil_texture.tif",
roads="data/roads.gpkg",
tenure_status="data/tenure.gpkg", # optional
tenure_field="legal_status", # required when tenure_status is provided
donor_pool_exclusion_distance_km=5.0,
matching_mode="mahalanobis",
neighbors=1,
calliper_elevation_m=200.0,
calliper_slope_deg=10.0,
calliper_soc_pct=10.0,
calliper_ndvi_pct=10.0,
calliper_road_distance_km=1.0,
smd_threshold=0.1,
parallel_trend_tolerance=0.01,
output_prefix="output/semdb_baseline",
)
print(result)
References
- Tool implementation:
wbtools_pro/src/tools/workflow_products/baseline_matching_and_diagnostics_assessment.rs
Advanced Operational Guidance
- Use fixed-seed reruns for verifier-facing reproducibility.
- Archive
summary,preflight,smd_diagnostics, andparallel_trendstogether. - Keep threshold profiles stable within a reporting cycle.
Positioning vs Carbon Verification Audit
- Baseline Matching and Diagnostics Assessment: baseline construction and comparability diagnostics.
- Carbon Verification Audit: downstream verification packaging and audit-ready reporting.
Use SEMDB first, then Carbon Verification Audit once baseline diagnostics are acceptable.
When To Use This Workflow
Use Baseline Matching and Diagnostics Assessment when baseline defensibility is under scrutiny and you need explicit comparability diagnostics before audit-stage reporting.
Results Delivery Checklist
- Input provenance and temporal scope documented.
- Threshold policy values recorded (
smd_threshold,parallel_trend_tolerance). - SMD and trend outputs reviewed and accepted.
- Baseline trajectory plus uncertainty outputs attached to handoff package.
Common Questions
Q: Can we skip SEMDB and go directly to Carbon Verification Audit? A: You can, but you lose explicit baseline-defensibility diagnostics that many reviewers request.
Q: What is the most common SEMDB review blocker? A: Imbalance or trend diagnostics that fail policy thresholds.
Q: Does SEMDB issue certified credits? A: No. It is an upstream baseline diagnostics workflow that supports defensible verification preparation.