Baseline Matching and Diagnostics Assessment

Purpose

The Baseline Matching and Diagnostics Assessment tool applies the Spatially Explicit Matched Dynamic Baseline (SEMDB) approach to prepare a defensible baseline package before Carbon Verification Audit by screening donor candidates, validating matching quality, and emitting trajectory plus uncertainty artifacts for audit preparation.

Typical Questions This Tool Helps Answer

  • Is the donor pool sufficiently comparable to project units before verification packaging?
  • Do matching diagnostics pass configured SMD and trend-consistency thresholds?
  • What baseline trajectory and uncertainty outputs should be carried into downstream carbon audit reporting?

Background

The Baseline Matching and Diagnostics Assessment tool is an upstream baseline-defensibility workflow, not a replacement for downstream verification packaging. It helps teams reduce audit risk by enforcing donor eligibility and comparability diagnostics before claim interpretation.

This workflow is most useful when regulator or verifier scrutiny requires explicit evidence that baseline assumptions are stable and reproducible.

Inputs

ParameterTypeRequiredDescription
project_boundaryVectorYesProject boundary polygons for treated units.
ecoregion_layerVectorYesEcoregion scope for donor eligibility filtering.
year0_landcoverRasterYesBaseline land-cover eligibility layer. This is the spatial reference grid — all raster inputs are harmonized to its CRS, cell size, and extent; road and tenure vectors are rasterised onto the same grid.
pre_period_ndvi_stackRaster (multiband)YesMulti-year pre-period NDVI stack. Bands must be in chronological order, oldest first (e.g., band 1 = year −5, band 2 = year −4, …, band 5 = year −1). The per-cell mean across all bands is used as a matching covariate.
elevationRasterYesElevation covariate used in matching (metres). Slope is derived automatically via Horn's 8-neighbour algorithm.
precipitationRasterYesPrecipitation covariate used in matching.
soil_organic_carbonRasterYesSOC covariate used in matching.
soil_texture_classRasterYesSoil texture class constraint covariate.
roadsVectorYesRoad/accessibility vector. All geometry types accepted (lines, polygons, points). The tool rasterises the layer onto the year0_landcover grid and computes per-cell distance to nearest road in metres via Euclidean distance transform.
tenure_statusVectorNoOptional tenure/legal-status polygon layer. Requires tenure_field when provided.
tenure_fieldStringNo (required with tenure_status)Attribute field name in tenure_status carrying the legal-status class (e.g., "legal_status", "TENURE_TYPE"). The tool enforces exact equality of this value between treated and donor pixels.

Parameters

  • donor_pool_exclusion_distance_km (optional): leakage-control exclusion distance from project boundary in km; default 5.0.
  • matching_mode (optional): mahalanobis or propensity_score; default mahalanobis.
  • neighbors (optional): nearest-neighbor count per treated cell; default 1.
  • with_replacement (optional): donor reuse toggle; default true.
  • tenure_field (required when tenure_status is provided): attribute field name in the tenure layer carrying the legal-status class value.
  • calliper_elevation_m (optional): hard calliper for elevation in absolute metres (SEMDB spec default: 200 m).
  • calliper_slope_deg (optional): hard calliper for slope in absolute degrees (SEMDB spec default: 10°).
  • calliper_soc_pct (optional): hard calliper for SOC as a percentage of the project-area mean (SEMDB spec default: 10%).
  • calliper_ndvi_pct (optional): hard calliper for mean pre-period NDVI as a percentage of the project-area mean (SEMDB spec default: 10%).
  • calliper_road_distance_km (optional): hard calliper for road distance in absolute kilometres (SEMDB spec default: 1 km).
  • exact_match_soil_texture_class (optional): enforce exact class matching; default true.
  • random_seed (optional): deterministic seed for reproducibility; default 0.
  • smd_threshold (optional): maximum accepted SMD across all covariates; default 0.1.
  • parallel_trend_tolerance (optional): trend-slope tolerance; default 0.01.
  • pre_period_min_years (optional): minimum pre-period depth; default 5.
  • output_prefix (required): output basename for all artifacts.

Matching Covariates

Six covariates are used in Mahalanobis distance matching. A separate SMD is reported for each in smd_diagnostics:

CovariateSourceSEMDB Calliper
Elevation (m)elevation rastercalliper_elevation_m (200 m)
Precipitationprecipitation rasterScoped by ecoregion
Soil Organic Carbonsoil_organic_carbon rastercalliper_soc_pct (10% of project mean)
Distance to Roads (m)roads vector → rasterised + Euclidean DTcalliper_road_distance_km (1 km)
Slope (°)Derived from elevationcalliper_slope_deg (10°)
Mean Pre-Period NDVIMean across pre_period_ndvi_stack bandscalliper_ndvi_pct (10% of project mean)

NDVI Stack Band Ordering

Bands in pre_period_ndvi_stack must be in chronological order, oldest first (band 1 = year −5, band 5 = year −1). There is no embedded year labelling — the tool uses band position. To create the stack:

gdal_merge.py -separate -o ndvi_stack.tif ndvi_y-5.tif ndvi_y-4.tif ndvi_y-3.tif ndvi_y-2.tif ndvi_y-1.tif

In QGIS: Raster → Miscellaneous → Merge, enable "Place each input file into a separate band", add files chronologically.

Spatial Alignment

All inputs are aligned to the year0_landcover grid before any computation:

  • Raster inputs are reprojected and resampled to match year0_landcover CRS, cell size, and extent.
  • Vector inputs (roads, tenure_status, project_boundary, ecoregion_layer) are reprojected to the year0_landcover CRS if needed.
  • Road and tenure vectors are rasterised directly onto the year0_landcover grid — you do not need to pre-rasterise them.

Outputs

ArtifactRuntime Output KeyTypeDescription
Donor candidate maskdonor_candidate_maskRasterDonor eligibility output after filtering.
Matching assignments previewmatching_assignments_previewCSVTreated-control assignment preview.
Matching preview summarymatching_previewJSONMatching configuration and preview diagnostics.
SMD diagnosticssmd_diagnosticsCSVCovariate balance diagnostics and threshold checks.
Parallel trends diagnosticsparallel_trendsCSVPre-period trend consistency diagnostics.
Baseline trajectorybaseline_trajectoryCSVBaseline trajectory output for downstream workflows.
Baseline uncertaintybaseline_uncertaintyCSVBaseline uncertainty envelope output.
Baseline trajectory summarybaseline_trajectory_summaryJSONMachine-readable trajectory summary.
Baseline spatial previewbaseline_spatial_previewRasterSpatial preview of baseline signal behavior.
Preflight reportpreflightJSONInput and rule-check readiness output.
Workflow summary contractsummaryJSONMain SEMDB output contract for handoff.

QA and Acceptance Criteria

Minimum acceptance before downstream Carbon Verification Audit:

  1. Required inputs validated and aligned.
  2. Donor pool remains operationally adequate after filters.
  3. SMD diagnostics pass agreed threshold policy.
  4. Parallel trend diagnostics pass agreed tolerance policy.
  5. Baseline trajectory/uncertainty artifacts are complete.

Troubleshooting

  • Sparse donor pool: review exclusion distance and eligibility constraints.
  • SMD failures: tighten callipers and inspect covariate outliers.
  • Trend failures: increase pre-period depth and review NDVI stack quality.
  • Reproducibility mismatch: lock random_seed and verify unchanged inputs.

For additional operator support, share these artifacts with Whitebox support:

  • summary
  • preflight
  • smd_diagnostics
  • parallel_trends

Example

import whitebox_workflows as wbw

env = wbw.WbEnvironment(include_pro=True, tier="pro")

result = env.baseline_matching_and_diagnostics_assessment(
    project_boundary="data/project_boundary.gpkg",
    ecoregion_layer="data/ecoregions.gpkg",
    year0_landcover="data/landcover_y0.tif",
    pre_period_ndvi_stack="data/ndvi_pre_stack.tif",  # bands in order: yr-5, yr-4, yr-3, yr-2, yr-1
    elevation="data/elevation.tif",
    precipitation="data/precip.tif",
    soil_organic_carbon="data/soc.tif",
    soil_texture_class="data/soil_texture.tif",
    roads="data/roads.gpkg",
    tenure_status="data/tenure.gpkg",     # optional
    tenure_field="legal_status",           # required when tenure_status is provided
    donor_pool_exclusion_distance_km=5.0,
    matching_mode="mahalanobis",
    neighbors=1,
    calliper_elevation_m=200.0,
    calliper_slope_deg=10.0,
    calliper_soc_pct=10.0,
    calliper_ndvi_pct=10.0,
    calliper_road_distance_km=1.0,
    smd_threshold=0.1,
    parallel_trend_tolerance=0.01,
    output_prefix="output/semdb_baseline",
)

print(result)

References

  • Tool implementation: wbtools_pro/src/tools/workflow_products/baseline_matching_and_diagnostics_assessment.rs

Advanced Operational Guidance

  • Use fixed-seed reruns for verifier-facing reproducibility.
  • Archive summary, preflight, smd_diagnostics, and parallel_trends together.
  • Keep threshold profiles stable within a reporting cycle.

Positioning vs Carbon Verification Audit

  • Baseline Matching and Diagnostics Assessment: baseline construction and comparability diagnostics.
  • Carbon Verification Audit: downstream verification packaging and audit-ready reporting.

Use SEMDB first, then Carbon Verification Audit once baseline diagnostics are acceptable.

When To Use This Workflow

Use Baseline Matching and Diagnostics Assessment when baseline defensibility is under scrutiny and you need explicit comparability diagnostics before audit-stage reporting.

Results Delivery Checklist

  1. Input provenance and temporal scope documented.
  2. Threshold policy values recorded (smd_threshold, parallel_trend_tolerance).
  3. SMD and trend outputs reviewed and accepted.
  4. Baseline trajectory plus uncertainty outputs attached to handoff package.

Common Questions

Q: Can we skip SEMDB and go directly to Carbon Verification Audit? A: You can, but you lose explicit baseline-defensibility diagnostics that many reviewers request.

Q: What is the most common SEMDB review blocker? A: Imbalance or trend diagnostics that fail policy thresholds.

Q: Does SEMDB issue certified credits? A: No. It is an upstream baseline diagnostics workflow that supports defensible verification preparation.