Overview

This manual is the long-form user guide for the WbW-R API.

WbW-R (Whitebox Workflows for R) is the R frontend to Whitebox Next Gen, providing access to modern geospatial processing for raster, vector, lidar, and sensor-bundle data through an R-friendly API and workflow model.

Whitebox is a long-running geospatial project that originated in academia and has grown into a broad analysis platform with recognized strengths in geomorphometry, hydrology, lidar processing, and remote sensing. Whitebox Next Gen is the current Rust-based major iteration focused on performance, cross-platform reliability, and modern data formats.

Whitebox Next Gen is intentionally full-stack: core geospatial capabilities that are often delegated to external C/C++ dependencies in other GIS platforms (for example raster I/O, projections, geometry/topology operations, and lidar handling) are implemented in the Whitebox codebase itself. This architecture is unusual in GIS and provides practical benefits for users: consistent behavior across platforms, tighter control over correctness and performance, fewer system-level dependency issues during installation, and faster iteration when fixing bugs or introducing new capabilities.

In this architecture, WbW-R is the orchestration layer for R users who need both practical scripting ergonomics and backend-scale performance.

This manual is written to be both:

  • beginner friendly: clear progression and runnable examples,
  • canonical reference: explicit operational patterns, constraints, and validation guidance aligned with backend behavior.

How to Use This Manual

This guide is intended for analysts who want to move from one-off exploratory commands to stable, scriptable geospatial workflows. The examples are practical and executable, but each chapter also explains the operational intent behind the code: when to keep work in memory, when to persist outputs, and how to validate results between stages.

A good first pass is chapter-order reading. After that, use the script index as a task-oriented entry point for adapting workflows to your own projects.

When adapting examples, keep a consistent script shape:

  1. session setup,
  2. input loading,
  3. transformation chain,
  4. validation and output persistence.

This structure keeps scripts easier to review, test, and maintain.

Goals:

  • Comprehensive API coverage.
  • Script-first documentation style.
  • Chapter layout aligned with the project README.

Documentation style rules:

  • Every major concept includes runnable code snippets.
  • Each chapter includes at least one end-to-end workflow script.
  • Tool parameter specifics are linked to shared tool reference docs as needed.

Common Pitfalls and Validation

  • Confirm inputs exist and have the expected CRS/schema/metadata before running long workflows.
  • Prefer explicit output names and formats for reproducibility, especially in batch scripts.
  • Re-open and inspect outputs after major steps to validate assumptions before chaining more tools.
  • For performance-sensitive runs, start with a small representative subset, then scale to full data.

Write Option References

For quick access to output option tables:

Installation and Setup

Treat setup as an explicit validation stage, not a one-time administrative task. The smoke test in this chapter confirms that the package and runtime can create a session and enumerate tools, which catches many environment issues before they surface in long-running processing scripts.

For collaborative projects, this chapter doubles as a baseline checklist for new machines and CI environments.

Install

Use this to install the current local package build for development and testing.

R CMD INSTALL crates/wbw_r/r-package/whiteboxworkflows

Smoke Test

This verifies that startup succeeds and tool enumeration is available in your current runtime.

Rscript -e 'library(whiteboxworkflows); s <- wbw_session(); cat(length(wbw_tool_ids(session = s)), "\n")'

Build and Preview

This manual is structured as an mdBook project.

Documentation build steps are part of quality control. Running these commands early and often helps catch broken links, malformed markdown, and chapter order drift before those issues accumulate. Live preview is especially useful when iterating on explanatory text and section hierarchy.

Install mdBook

cargo install mdbook

Build the Manual

Run this when changing chapter content, links, or section ordering.

From repository root:

cd crates/wbw_r/manual
mdbook build

Generated HTML is written to book/.

Live Preview

Use live preview while refining long explanatory sections and chapter flow.

cd crates/wbw_r/manual
mdbook serve --open

Authoring Notes

  • Keep chapter order aligned with src/SUMMARY.md.
  • Prefer runnable snippets over pseudo-code.
  • Link chapter recipes to concrete scripts listed in the script index.

Quick Start

This chapter is a minimal "first success" path. It verifies that your R installation, package setup, session creation, tool execution, and output write path all work together before you take on larger workflows.

Once this slice runs end-to-end, the remaining chapters explain how to make the same pattern more explicit, reproducible, and resilient.

The example below demonstrates the core session-first model in WbW-R: create a session, load data, run a tool, and inspect the result object.

library(whiteboxworkflows)

s <- wbw_session()
dem <- wbw_read_raster("dem.tif")

result <- wbw_slope(dem = dem$file_path(), output = "slope.tif")
print(result)

These chapters extend the same lifecycle with discovery, progress handling, and larger workflow templates.

Quick-Start Conventions

  • Prefer explicit wbw_session(...) construction in scripts.
  • Validate tool visibility before long-running pipelines.
  • Persist outputs deliberately and re-open typed objects for verification.

Session and Discovery

This chapter covers session lifecycle and tool discovery patterns.

Session and discovery APIs establish runtime certainty before execution. By creating a session explicitly and checking tool visibility up front, you avoid late failures in long pipelines and make script behavior easier to reason about across different environments and deployment targets.

Create Session

This is the standard session bootstrap for scripts. Keep it explicit so runtime state is obvious to future readers.

library(whiteboxworkflows)

s <- wbw_session()
print(s)

Visibility Checks

Use hard visibility checks before long jobs so missing-tool failures happen immediately.

library(whiteboxworkflows)

s <- wbw_session()
ids <- wbw_tool_ids(session = s)
cat('Visible tools:', length(ids), '\n')

if (!wbw_has_tool('slope', session = s)) {
	stop('slope is not available in this runtime session')
}

Search and Describe

Use this when you know the task objective but need to confirm exact tool IDs and parameter contracts.

library(whiteboxworkflows)

matches <- wbw_search_tools('lidar')
print(matches[1:min(5, length(matches))])

desc <- wbw_describe_tool('slope')
print(desc)

Category-Based Discovery

Category discovery is useful for UI generation, guided workflows, and policy checks in larger automation systems.

library(whiteboxworkflows)

summary <- wbw_category_summary()
print(summary)

cats <- wbw_get_all_categories()
print(cats)

raster_tools <- wbw_tools_in_category('Raster')
print(head(raster_tools, 20))

proj_tools <- wbw_tools_in_category('projection_georeferencing')
print(head(proj_tools, 20))

Session API Reference

WbW-R is function-first, but the wbw_session() object still exposes a useful set of callable methods for execution, projection utilities, topology checks, and typed I/O helpers.

Session Construction and Execution

MethodDescription
run_toolExecute a tool with a named argument list.
run_tool_with_progressExecute a tool and return structured progress/result output.
list_toolsReturn visible tool IDs for the session/license context.

Typed I/O Helpers

MethodDescription
read_vectorRead vector data with optional read options.
write_rasterWrite one raster with optional format options.
write_rastersWrite multiple rasters in one call.
write_vectorWrite one vector with optional format options.

Projection Utility Methods

MethodDescription
projection_to_ogc_wktConvert EPSG code to OGC WKT text.
projection_identify_epsgIdentify EPSG from CRS text where possible.
projection_reproject_pointsReproject point collections between EPSG systems.
projection_reproject_pointReproject a single point between EPSG systems.

Topology Utility Methods

MethodDescription
topology_intersects_wkt, topology_contains_wkt, topology_within_wktCore spatial predicate checks on WKT geometries.
topology_touches_wkt, topology_disjoint_wkt, topology_crosses_wkt, topology_overlaps_wktAdditional topological predicates for rule-based QA.
topology_covers_wkt, topology_covered_by_wktBoundary-aware containment checks.
topology_relate_wktReturn DE-9IM relation text for exact topology logic.
topology_distance_wktReturn shortest distance between WKT geometries.
topology_vector_feature_relationEvaluate topology relation between indexed features in two vector objects.
topology_is_valid_polygon_wkt, topology_make_valid_polygon_wktValidate and repair polygon WKT geometries.
topology_buffer_wktBuild buffered geometry from WKT and distance.
  1. Build session explicitly.
  2. Check required tools with wbw_has_tool(...).
  3. Use wbw_describe_tool(...) for parameter verification.
  4. Use category functions to drive guided UX or script validation.

Tool Execution and Progress

This chapter documents execution styles and progress handling in WbW-R.

Execution structure affects observability and reproducibility. Explicit session execution makes dependencies and runtime state clear, while progress-aware patterns help with monitoring, troubleshooting, and user feedback during long operations. Think of callbacks as operational instrumentation, not just console output.

Explicit Session Execution

Use explicit session execution as the default in production scripts so runtime dependencies and state are clear.

library(whiteboxworkflows)

s <- wbw_session()

result <- wbw_slope(dem = 'dem.tif', output = 'slope.tif')
print(result)

Progress-Aware Execution

Use progress-aware execution for long operations, remote runs, and logs that require periodic status updates.

library(whiteboxworkflows)

s <- wbw_session()

result <- wbw_run_tool_with_progress(
	'slope',
	args = list(dem = 'dem.tif', output = 'slope.tif'),
	session = s,
	on_progress = wbw_print_progress
)

str(result$progress)

Custom Progress Callback

This callback pattern is useful when progress events need to feed custom log formats or monitoring pipelines.

progress_cb <- local({
	last <- -1L
	function(pct = NA_real_, message = '') {
		msg <- if (is.null(message)) '' else as.character(message[[1]])

		if (!is.numeric(pct) || length(pct) == 0L || is.na(pct[[1]])) {
			m <- regexpr('(-?[0-9]+(\\.[0-9]+)?)\\s*%', msg, perl = TRUE)
			if (m[[1]] >= 0L) {
				token <- regmatches(msg, m)
				pct <- as.numeric(sub('%.*$', '', token))
			} else {
				pct <- NA_real_
			}
		}

		if (is.numeric(pct) && length(pct) > 0L && !is.na(pct[[1]])) {
			value <- as.numeric(pct[[1]])
			if (value <= 1.0) value <- value * 100.0
			pct_int <- as.integer(max(0, min(100, floor(value))))
			if (pct_int > last) {
				cat(sprintf('[%3d%%] %s\\n', pct_int, msg))
				last <<- pct_int
			}
		}
	}
})

# Use: on_progress = progress_cb
  1. Validate tool visibility first.
  2. Run with explicit session.
  3. Capture progress for long operations.
  4. Persist outputs and re-open typed objects for post-processing.

Working with Rasters

This chapter documents practical raster workflows in WbW-R with emphasis on inspection, iteration, modification, and persistence.

Raster processing usually alternates between backend tools for heavy transformations and array-level edits for custom logic. The key design choice is to use tool operations for scale and consistency, then reserve manual array passes for domain-specific adjustments that are hard to express with existing tools.

Raster Lifecycle

This lifecycle makes assumptions explicit before computation and keeps your workflow reviewable.

Typical lifecycle:

  1. Read raster.
  2. Inspect metadata.
  3. Transform values (tool-driven or array-driven).
  4. Persist outputs with explicit options when needed.
library(whiteboxworkflows)

r <- wbw_read_raster('dem.tif')
meta <- r$metadata()

print(meta$rows)
print(meta$columns)
print(meta$nodata)

Memory-Backed Rasters for Pipeline Efficiency

For workflows that chain multiple tool operations, memory-backed rasters eliminate disk I/O between steps. This is especially valuable when processing large rasters in complex pipelines. Rasters remain in process memory, accessible to subsequent tools without writing to disk.

Load a raster into memory with file_mode = "m":

library(whiteboxworkflows)

# Read directly into memory; no disk path required for subsequent ops
r1 <- wbw_read_raster('dem.tif', file_mode = "m")
r2 <- wbw_read_raster('slope.tif', file_mode = "m")

# Both rasters are now memory-backed. Chain operations without disk:
result <- r1 + r2
print(result$file_path())  # prints: memory://raster/...

Memory-backed paths are compatible with all downstream raster operations:

library(whiteboxworkflows)

r <- wbw_read_raster('dem.tif', file_mode = "m")

# Inspect and transform
meta <- r$metadata()
scaled <- wbw_multiply(input = r$file_path(), multiplier = 1.5)

# Export to disk when ready
wbw_write_raster(scaled, 'dem_scaled_1p5x.tif')

Memory Lifecycle and Cleanup

Memory-backed rasters persist in the process store until explicitly removed or cleared. For long-running jobs, manage memory explicitly to avoid accumulation:

library(whiteboxworkflows)

# Check current memory usage
count_before <- wbw_raster_memory_count()
bytes_before <- wbw_raster_memory_bytes()
cat('Rasters in memory:', count_before, '\n')
cat('Bytes used:', bytes_before, '\n')

# Read two rasters
r1 <- wbw_read_raster('large1.tif', file_mode = "m")
r2 <- wbw_read_raster('large2.tif', file_mode = "m")

cat('After reads:', wbw_raster_memory_count(), '\n')

# Remove one raster when done
wbw_remove_raster_from_memory(r1)
cat('After remove:', wbw_raster_memory_count(), '\n')

# Or clear all rasters at once
wbw_clear_raster_memory()
cat('After clear:', wbw_raster_memory_count(), '\n')

Best practices:

  • Use file_mode = "m" for intermediate results in tool chains.
  • Export memory-backed rasters to disk with write() when persisting results.
  • Call remove_raster_from_memory() after a raster is no longer needed.
  • Use clear_raster_memory() between independent job phases.
  • Use wbw_clear_memory() when resetting all in-process raster/vector/lidar stores together.
  • Monitor raster_memory_count() and raster_memory_bytes() in large pipelines.

Iterating Through Grid Cells

Use this only for logic that cannot be expressed through existing tools or vectorized operations.

Use to_array() for cell-level logic.

library(whiteboxworkflows)

r <- wbw_read_raster('dem.tif')
a <- r$to_array()
meta <- r$metadata()

nr <- dim(a)[1]
nc <- dim(a)[2]

for (row in seq_len(nr)) {
  for (col in seq_len(nc)) {
    z <- a[row, col]
    if (!is.na(z) && z != meta$nodata) {
      if (z < 0) {
        a[row, col] <- 0
      }
    }
  }
}

Fast Random Cell Access with Pinning

For neighbourhood logic or flow-path traversal that performs frequent random cell reads/writes, use pinned raster views. Pinning loads the raster values once and avoids repeated conversion overhead inside tight loops.

Single-raster pinning:

library(whiteboxworkflows)

r <- wbw_read_raster("dem.tif")
p <- wbw_pinned_raster(r)

v <- p$get_value(100, 200)
p$set_value(100, 200, v + 1)
p$close()

Two-raster read/write scan loop:

library(whiteboxworkflows)

src <- wbw_read_raster("D8Pointer.tif")
dst <- wbw_read_raster("output_template.tif")

wbw_with_pinned_rasters(src, dst, .f = function(srcp, dstp) {
  meta <- src$metadata()
  for (row in seq_len(meta$rows)) {
    for (col in seq_len(meta$columns)) {
      value <- srcp$get_value(row, col)
      dstp$set_value(row, col, value)
    }
  }
})

Notes:

  • wbw_with_pinned_rasters() closes all pins automatically.
  • wbw_pin_rasters(...) is available when manual close control is preferred.
  • Pinned write-back currently targets file-backed rasters in wbw_r.

Writing Modified Data Back

This demonstrates creating a derived raster while preserving base geospatial context.

library(whiteboxworkflows)

base <- wbw_read_raster('dem.tif')
a <- base$to_array()
a <- a * 1.05

out <- wbw_array_to_raster(a, base, output_path = 'dem_scaled.tif')
print(out)

Supported raster write keys and valid values are documented in Output Controls.

Multi-Band Iteration

Use this when transform rules depend on band identity or per-band thresholds.

library(whiteboxworkflows)

r <- wbw_read_raster('multiband.tif')
a <- r$to_array()

# If array is [rows, cols, bands], loop accordingly.
d <- dim(a)
if (length(d) == 3) {
  nr <- d[1]
  nc <- d[2]
  nb <- d[3]

  for (b in seq_len(nb)) {
    for (row in seq_len(nr)) {
      for (col in seq_len(nc)) {
        v <- a[row, col, b]
        if (!is.na(v) && v < 0) {
          a[row, col, b] <- 0
        }
      }
    }
  }
}

wbw_array_to_raster(a, r, output_path = 'multiband_clamped.tif')

Tool-First Raster Processing

Prefer this pattern for heavy processing: let optimized tools do most work, then apply targeted custom edits.

library(whiteboxworkflows)

s <- wbw_session()
dem <- wbw_read_raster('dem.tif')

wbw_fill_depressions(dem = dem$file_path(), output = 'filled.tif')
filled <- wbw_read_raster('filled.tif')

slope <- filled$square()  # placeholder unary op example
slope$write('slope_out.tif', overwrite = TRUE)

NoData and Performance Guidance

  • Always account for metadata()$nodata in per-cell loops.
  • Prefer tool operations for heavy transforms.
  • Use array loops only for custom logic not available as tools.

Raster Object Method Reference

Metadata and Conversion

MethodDescription
metadataReturn raster metadata (dimensions, extent, nodata, CRS fields).
file_pathReturn the backing raster path.
band_countReturn raster band count.
active_bandReturn active band index.
get_value, set_valueRead/write single cell values using 1-based row/column/band indices.
crs_epsg, crs_wktInspect CRS metadata.
to_arrayConvert raster to in-memory array for custom processing.
to_starsConvert raster to a stars object (requires terra support path).
pinCreate a pinned view for low-overhead random cell access in tight loops.

Arithmetic and Unary Transform Methods

MethodDescription
add, subtract, multiply, divideCellwise binary arithmetic with another raster/path operand.
abs, ceil, floor, round, square, sqrtCommon unary numeric transforms.
log10, log2, exp, exp2Log and exponential transforms.
sin, cos, tan, sinh, cosh, tanhTrigonometric and hyperbolic transforms.

Persistence

MethodDescription
deep_copyCopy raster to a new output path and return a raster object.
writeWrite raster to disk with optional output options.

Working with Vectors

This chapter covers schema inspection, feature iteration, attribute reads/writes, and persistence workflows.

Reliable vector workflows depend on stable schema and attribute contracts as much as geometry itself. The patterns in this chapter emphasize inspecting structure early, applying deterministic edits, and validating outputs after persistence so downstream analysis remains predictable.

See Also: Online Sources

If your workflow starts by downloading vectors from web providers (starting with OSM Overpass), use the dedicated chapter:

Read and Inspect

Begin with schema and metadata inspection so edits are grounded in the actual field model.

library(whiteboxworkflows)

v <- wbw_read_vector('roads.gpkg')
schema <- v$schema()
print(schema)
print(v$metadata())

Memory-Backed Vectors for Pipeline Efficiency

For workflows that chain multiple vector operations, memory-backed vectors eliminate disk I/O between steps. This is valuable for complex pipelines where intermediate results are passed between spatial operations.

Load a vector into memory with file_mode = "m":

library(whiteboxworkflows)

# Read directly into memory
roads <- wbw_read_vector('roads.gpkg', file_mode = "m")
rivers <- wbw_read_vector('rivers.gpkg', file_mode = "m")

print(roads$path)  # prints: memory://vector/...

Memory-backed vectors are compatible with all downstream operations:

library(whiteboxworkflows)

v <- wbw_read_vector('polygons.gpkg', file_mode = "m")

# Inspect schema and metadata
schema <- v$schema()
meta <- v$metadata()

# Pass to spatial tools
centroid_path <- wbw_centroid_vector(input = v$path, output = 'centroids')

# Export to disk when ready
result <- wbw_read_vector(centroid_path)
result$write('centroids_final.gpkg')

Vector Memory Lifecycle

Memory-backed vectors persist until explicitly removed or cleared. For long-running vector pipelines, manage memory explicitly:

library(whiteboxworkflows)

# Check current memory
cat('Vectors in memory:', wbw_vector_memory_count(), '\n')

# Read vectors
v1 <- wbw_read_vector('large1.gpkg', file_mode = "m")
v2 <- wbw_read_vector('large2.gpkg', file_mode = "m")

cat('After reads:', wbw_vector_memory_count(), '\n')

# Remove when done
wbw_remove_vector_from_memory(v1)
cat('After remove:', wbw_vector_memory_count(), '\n')

# Or clear all
wbw_clear_vector_memory()
cat('After clear:', wbw_vector_memory_count(), '\n')

Implicit Memory Output from Tools

All vector-output tools store their result in memory automatically when the output argument is omitted (NULL). You do not need to pass file_mode = "m" or choose a temporary path — simply leave output out and the returned wbw_vector object is already memory-backed:

library(whiteboxworkflows)

wbe <- wbw_make_session()
roads <- wbw_read_vector('roads.gpkg')

# No output path — result is stored in memory automatically
centroids <- wbe$centroid_vector(input = roads$path)
cat(centroids$path, '\n')  # prints: memory://vector/...

# Chain operations without any intermediate files
clipped <- wbe$clip(input = centroids$path, clip = 'boundary.gpkg')
cat(clipped$path, '\n')  # also memory://vector/...

# Persist the final result only
wbw_write_vector(clipped, 'result.gpkg')

This applies to all tool categories — GIS, hydrology, geomorphometry, and stream network tools all follow the same rule. Providing an explicit output path writes to disk as before.

Best practices:

  • Use file_mode = "m" for intermediate spatial analysis results.
  • Export memory-backed vectors to disk with write() when persisting final outputs.
  • Call remove_vector_from_memory() after a vector is no longer needed.
  • Use clear_vector_memory() between independent analysis phases.
  • Use wbw_clear_memory() when resetting all in-process raster/vector/lidar stores together.

Iterate Through Features

Use this for quality checks, custom filters, and record-level diagnostics.

library(whiteboxworkflows)

v <- wbw_read_vector('roads.gpkg')
count <- v$metadata()$feature_count

for (i in seq_len(count)) {
  attrs <- v$attributes(i)
  # attrs is a named list
  print(attrs)
}

Read and Update Attribute Table

This example combines common edit actions: single-value updates, grouped updates, and field creation.

library(whiteboxworkflows)

v <- wbw_read_vector('roads.gpkg')

name1 <- v$attribute(1, 'name')
print(name1)

v$update_attribute(1, 'name', 'Main Street')
v$update_attributes(2, list(speed = 50, class = 'collector'))
v$add_field('reviewed', field_type = 'integer', default_value = 0)

Persist Vector Outputs

This pattern demonstrates tool-driven persistence and post-write verification.

For complete write-option keys and allowed values, see Output Controls.

library(whiteboxworkflows)

s <- wbw_session()
roads <- wbw_read_vector('roads.gpkg')

wbw_centroid_vector(input = roads$file_path(), output = 'roads_centroids.gpkg')

centroids <- wbw_read_vector('roads_centroids.gpkg')
print(centroids$metadata())

Practical Notes

  • Call schema() first to confirm field names and expected types.
  • Use update_attributes() for grouped feature edits.
  • Re-read output files to validate schema and values after writes.

Vector Object Method Reference

Metadata and Structure

MethodDescription
metadataReturn vector metadata (geometry type, feature count, CRS, fields).
schemaReturn field names and types as a data frame.
pathReturn backing vector path.
to_terra, to_sfConvert to terra/sf objects for ecosystem workflows.

Attribute Access and Edits

MethodDescription
attributesReturn all attribute values for one feature index.
attributeReturn one field value for one feature index.
update_attributesUpdate multiple fields for one feature index.
update_attributeUpdate one field for one feature index.
add_fieldAdd a new field with declared type and default value.

Persistence

MethodDescription
deep_copyCopy vector to a new path with optional write options.
writeWrite vector to a new output path.

Working with Lidar

This chapter documents lidar workflows in WbW-R, including file-backed processing, metadata checks, and output control patterns.

Lidar workflows are strongly shaped by dataset size and I/O cost. The guiding pattern here is to use file-backed objects and vectorized matrix operations for normal workloads, then switch to chunked processing when point counts exceed comfortable memory limits.

Baseline Workflow

Use this as a first-run validation before introducing matrix edits or chunked processing.

library(whiteboxworkflows)

l <- wbw_read_lidar('survey.las')
print(l$metadata())

copy <- l$deep_copy('survey_copy.las', overwrite = TRUE)
print(copy$metadata())

Memory-Backed Lidar for Pipeline Efficiency

For workflows that chain multiple lidar operations, memory-backed lidar objects eliminate disk I/O between steps. This is valuable when processing large point clouds through sequential filtering or classification steps.

Load a point cloud into memory with file_mode = "m":

library(whiteboxworkflows)

# Read directly into memory
survey <- wbw_read_lidar('survey.las', file_mode = "m")
print(survey$file_path())  # prints: memory://lidar/...

Memory-backed lidar objects support the full matrix/data-frame API and all downstream operations:

library(whiteboxworkflows)

# Read into memory
survey <- wbw_read_lidar('survey.las', file_mode = "m")

# Inspect and process
meta <- survey$metadata()
cat('Points:', meta$point_count, '\n')

# Extract and edit points
pts <- survey$to_matrix(fields = c('x', 'y', 'z', 'classification'))
high <- pts[, 3] > 250
pts[high, 4] <- 6

# Write edits back to disk
edited <- survey$from_matrix(
  pts,
  output_path = 'survey_filtered.laz',
  overwrite = TRUE,
  fields = c('x', 'y', 'z', 'classification')
)

Lidar Memory Lifecycle

Memory-backed lidar objects persist until explicitly removed or cleared. For long-running lidar pipelines, manage memory explicitly:

library(whiteboxworkflows)

# Check current memory
cat('Lidar objects in memory:', wbw_lidar_memory_count(), '\n')

# Read point clouds
survey1 <- wbw_read_lidar('large1.las', file_mode = "m")
survey2 <- wbw_read_lidar('large2.las', file_mode = "m")

cat('After reads:', wbw_lidar_memory_count(), '\n')

# Remove when done
wbw_remove_lidar_from_memory(survey1)
cat('After remove:', wbw_lidar_memory_count(), '\n')

# Or clear all
wbw_clear_lidar_memory()
cat('After clear:', wbw_lidar_memory_count(), '\n')

Implicit Memory Output from Tools

All lidar-output tools store their result in memory automatically when the output argument is omitted (NULL). You do not need to pass file_mode = "m" or choose a temporary path — simply leave output out and the returned wbw_lidar object is already memory-backed:

library(whiteboxworkflows)

wbe <- wbw_make_session()
survey <- wbw_read_lidar('survey.laz')

# No output path — result is stored in memory automatically
filtered <- wbe$filter_lidar_classes(input = survey$path, excluded_classes = list(7L))
cat(filtered$path, '\n')  # prints: memory://lidar/...

# Chain operations without any intermediate files
thinned <- wbe$lidar_thin(input = filtered$path, resolution = 0.5)
cat(thinned$path, '\n')  # also memory://lidar/...

# Persist the final result only
wbw_write_lidar(thinned, 'survey_clean.copc.laz')

This applies to all lidar tools. Providing an explicit output argument writes to disk as before.

Best practices:

  • Use file_mode = "m" for intermediate point cloud processing.
  • Export memory-backed lidar to disk with write() when persisting final outputs.
  • Call remove_lidar_from_memory() after a point cloud is no longer needed.
  • Use clear_lidar_memory() between independent analysis phases.
  • Use wbw_clear_memory() when resetting all in-process raster/vector/lidar stores together.
  • Monitor lidar_memory_count() for large processing jobs.

Iterating Through Lidar Points

Stable WbW-R lidar objects are file-backed and tool-oriented, with explicit columnar point access through matrix/data-frame APIs. Direct point-by-point iteration is still not the primary API path.

Recommended point-level workflow:

  1. Use to_matrix() or to_data_frame() with selected point fields.
  2. Apply vectorized edits in base R/tidy workflows.
  3. Write updates with from_matrix(...) or from_data_frame(...).

The example below uses a simple reclassification rule to illustrate the matrix roundtrip pattern.

library(whiteboxworkflows)

l <- wbw_read_lidar('survey.las')
pts <- l$to_matrix(fields = c('x', 'y', 'z', 'classification'))

ground <- pts[, 4] == 2
pts[ground, 4] <- 6

edited <- l$from_matrix(
  pts,
  output_path = 'survey_reclassified.laz',
  overwrite = TRUE,
  fields = c('x', 'y', 'z', 'classification')
)

print(edited$point_count())

Tool-Driven Lidar Processing

Use tool-driven processing when extracting QA outputs and diagnostics from lidar datasets.

For complete lidar write-option keys and allowed values, see Output Controls.

library(whiteboxworkflows)

s <- wbw_session()
wbw_lidar_info(input = 'survey.las', output = 'survey_info.html')

Chunked Matrix Streaming

For very large point clouds, use chunked matrix streaming to avoid keeping the full point matrix in memory.

Recommended chunked workflow:

  1. Read chunks with to_matrix_chunks(...).
  2. Apply vectorized edits in each chunk.
  3. Write edited chunks with from_matrix_chunks(...).

Use this when full-matrix operations would exceed available memory.

library(whiteboxworkflows)

l <- wbw_read_lidar('survey.las')
fields <- c('x', 'y', 'z', 'classification')

chunks <- l$to_matrix_chunks(chunk_size = 200000, fields = fields)
for (i in seq_along(chunks)) {
  high <- chunks[[i]][, 3] > 250
  chunks[[i]][high, 4] <- 6
}

edited <- l$from_matrix_chunks(
  chunks,
  output_path = 'survey_chunked_reclassified.laz',
  overwrite = TRUE,
  fields = fields
)

print(edited$point_count())

Notes:

  • LAS/LAZ chunk outputs use shared core streaming rewrite.
  • Other output formats currently fall back to non-streaming assembly in this API path.

Best Practices

  • Validate CRS before and after reprojection.
  • Keep source lidar immutable; write derived products to new files.
  • Prefer COPC/LAZ outputs for large cloud workflows.

Lidar Object Method Reference

Metadata and Access

MethodDescription
metadataReturn lidar metadata (bounds, point format, CRS, counts).
point_countReturn total number of points.
file_path, pathReturn backing lidar path.
get_short_filenameReturn basename of lidar file.

Matrix and Data-Frame Conversion

MethodDescription
to_matrixRead selected lidar fields as numeric matrix.
to_data_frameRead selected lidar fields as data frame.
to_matrix_chunksRead selected lidar fields in chunked matrix blocks.

Writing Edited Point Data

MethodDescription
from_matrixWrite edited matrix data to lidar output.
from_data_frameWrite edited data frame fields to lidar output.
from_matrix_chunksWrite chunked matrix edits to lidar output.

Persistence

MethodDescription
deep_copyCopy lidar to a new path with optional write options.
writeWrite lidar to a new output path.

Working with Sensor Bundles

This chapter documents supported sensor bundle families and common workflows.

Bundle APIs provide a common operational interface across heterogeneous sensor families. The conceptual goal is to standardize ingestion and inspection steps even when archive layout, naming conventions, or QA assets differ between providers.

Supported Families

  • Generic auto-detect: wbw_read_bundle(...)
  • Landsat: wbw_read_landsat(...)
  • Sentinel-1 SAFE: wbw_read_sentinel1(...)
  • Sentinel-2 SAFE: wbw_read_sentinel2(...)
  • PlanetScope: wbw_read_planetscope(...)
  • ICEYE: wbw_read_iceye(...)
  • DIMAP: wbw_read_dimap(...)
  • Maxar/WorldView: wbw_read_maxar_worldview(...)
  • RADARSAT-2: wbw_read_radarsat2(...)
  • RCM: wbw_read_rcm(...)

Common Inspection Pattern

Start here to establish what each bundle contains before choosing analysis channels.

library(whiteboxworkflows)

b <- wbw_read_bundle('BUNDLE_ROOT')
print(b$family)
print(b$key_summary())
print(b$list_band_keys())
print(b$list_measurement_keys())
print(b$list_qa_keys())
print(b$list_asset_keys())

Family-Specific Examples

These examples follow one shared shape across providers: load, inspect, sample key channels, then generate QA-ready outputs.

Sentinel-2

Use this for optical band workflows where band identity is explicit.

s2 <- wbw_read_sentinel2('S2_SCENE.SAFE')
red <- s2$read_band('B04')
nir <- s2$read_band('B08')
rgb <- s2$write_true_colour('s2_true_colour.tif')

Landsat

Use this when metadata such as cloud cover and processing level drive filtering.

ls <- wbw_read_landsat('LC09_SCENE')
print(ls$processing_level())
print(ls$cloud_cover_percent())
preview <- ls$read_band(ls$list_band_keys()[[1]])

Sentinel-1 / SAR Families

Use this pattern for SAR measurement access and quick false-colour QA.

s1 <- wbw_read_sentinel1('S1_SCENE.SAFE')
print(s1$polarizations())
meas <- s1$read_measurement(s1$list_measurement_keys()[[1]])
fc <- s1$write_false_colour('s1_false_colour.tif')

PlanetScope / ICEYE / DIMAP / Maxar-WorldView / RADARSAT-2 / RCM

Use this loop for multi-provider intake checks with a consistent script shape.

loaders <- list(
  wbw_read_planetscope,
  wbw_read_iceye,
  wbw_read_dimap,
  wbw_read_maxar_worldview,
  wbw_read_radarsat2,
  wbw_read_rcm
)

paths <- list('PLANETSCOPE_SCENE', 'ICEYE_SCENE', 'DIMAP_SCENE', 'MAXAR_SCENE', 'RADARSAT2_SCENE', 'RCM_SCENE')

for (i in seq_along(loaders)) {
  b <- loaders[[i]](paths[[i]])
  print(c(b$family, b$bundle_root))
}
  1. Open with family-specific reader when known.
  2. Inspect key groups with key_summary() and list_*_keys().
  3. Read representative channels via read_any() or family-specific methods.
  4. Generate preview/composite rasters for QA.
  5. Persist derived outputs and record source family + key choices.

Calibration Contract Notes (Landsat/Sentinel-2)

Remote-sensing radiometric and thermal tools in WbW-R now rely on a standardized sensor-bundle calibration contract in wbraster.

  • Sentinel-2 bundle metadata provides quantification values used for DN to TOA reflectance scaling.
  • Landsat bundles provide typed per-band reflectance and thermal constants used in TOA reflectance and LST workflows.

For best results, keep original SAFE/MTL metadata with scene rasters when moving or staging bundle data for analysis.

Sensor Bundle Object Method Reference

Metadata and Key Discovery

MethodDescription
metadataReturn bundle metadata (family, product descriptors, CRS hints).
family, bundle_root, pathReturn family ID and resolved bundle root/path.
key_summarySummarize available key groups and counts.
list_band_keys, list_measurement_keysList band/measurement keys.
list_qa_keys, list_aux_keys, list_asset_keysList QA, auxiliary, and asset keys.
resolve_key, has_keyResolve and verify key existence across key types.

Raster Access by Key

MethodDescription
read_anyRead key by searching configured key types.
read_bandRead band key as raster.
read_measurementRead measurement key as raster.
read_qa_layer, read_aux_layer, read_assetRead QA/auxiliary/asset keys as rasters.
read_preview_rasterReturn selected preview raster descriptor.

Composite Generation

MethodDescription
write_true_colourWrite true-colour composite raster.
write_false_colourWrite false-colour composite raster.

Common Metadata Accessors

MethodDescription
acquisition_datetime_utcReturn acquisition timestamp when available.
processing_level, product_typeReturn processing/product descriptors.
tile_id, mission, acquisition_modeReturn common platform and acquisition descriptors.
cloud_cover_percent, polarizationsReturn cloud or polarization metadata when available.

Output Controls

This chapter documents practical output controls for raster, vector, and lidar workflows in WbW-R.

Output configuration is where reproducibility is made explicit. Defaults are useful during exploration, but production scripts should pin extensions and format options so artifacts remain comparable across runs and environments. Treat this chapter as the policy layer for how outputs are persisted.

General Principles

  • Start with defaults until explicit format constraints are required.
  • Use explicit output extensions for reproducibility.
  • Re-open outputs and validate metadata after writes.

Raster Output Controls

Raster objects expose write(...) and deep_copy(...) with optional options.

Use explicit raster options when output layout and compression behavior must be stable across environments.

library(whiteboxworkflows)

r <- wbw_read_raster('dem.tif')

# Default write
r$write('dem_default.tif', overwrite = TRUE)

# Write with options list
r$write(
  'dem_cog.tif',
  overwrite = TRUE,
  options = list(
    compress = TRUE,
    strict_format_options = TRUE,
    geotiff = list(
      layout = 'cog',
      tile_size = 512,
      compression = 'deflate',
      bigtiff = FALSE
    )
  )
)

Raster Write Option Reference

Raster write options are supplied as an R list. The current top-level keys are:

  • compress: logical convenience switch for GeoTIFF-family outputs.
  • strict_format_options: logical validation switch.
  • geotiff: nested list for GeoTIFF / BigTIFF / COG-specific controls.
  • jpeg2000: nested list for JPEG2000 (.jp2) controls.

compress

compress is a convenience flag used for GeoTIFF-family outputs.

  • TRUE: maps to GeoTIFF deflate compression.
  • FALSE: maps to uncompressed GeoTIFF output.

Default behavior:

  • compress is unset by default (not TRUE and not FALSE).
  • For GeoTIFF-family outputs, an unset compress falls through to the GeoTIFF writer default compression, which is deflate.
  • For non-GeoTIFF outputs, compress has no effect.

If geotiff$compression is also supplied, the explicit geotiff$compression value takes precedence.

strict_format_options

strict_format_options must be TRUE or FALSE.

  • FALSE (default): GeoTIFF-specific options are ignored when writing a non-GeoTIFF output; JPEG2000-specific options are ignored when writing a non-.jp2 output.
  • TRUE: GeoTIFF-specific options on a non-GeoTIFF output path, or JPEG2000-specific options on a non-.jp2 output path, raise an error.

jpeg2000

The nested jpeg2000 list supports these keys:

  • compression: character scalar compression mode.
  • quality_db: numeric quality target in dB used when compression = 'lossy'.
  • decomp_levels: non-negative integer decode-resolution level hint (0 to 255).

Default values when keys are omitted:

  • compression: 'lossy'.
  • quality_db: 35.0 when compression = 'lossy' and no explicit quality_db is provided.
  • decomp_levels: writer default (unset).

Supported jpeg2000$compression values:

  • 'lossless'
  • 'lossy'

Notes:

  • quality_db is optional; when omitted with compression = 'lossy', the writer default quality is used.
  • JPEG2000 color-space controls are not exposed in the R wrapper yet.

geotiff

The nested geotiff list supports these keys:

  • compression: character scalar naming the compression codec.
  • bigtiff: logical.
  • layout: character scalar naming the layout.
  • rows_per_strip: positive integer used when layout = 'stripped'.
  • tile_width: positive integer used when layout = 'tiled'.
  • tile_height: positive integer used when layout = 'tiled'.
  • tile_size: positive integer shortcut for COG tile size, and also accepted as a shortcut for both tile width and tile height when layout = 'tiled'.
  • cog_tile_size: positive integer alias for tile_size when layout = 'cog'.

Default values when keys are omitted:

  • compression: 'deflate'.
  • bigtiff: FALSE.
  • layout: 'standard'.
  • rows_per_strip: 1 when layout = 'stripped'.
  • tile_width: 512 when layout = 'tiled'.
  • tile_height: defaults to tile_width when layout = 'tiled'.
  • tile_size / cog_tile_size: 512 when layout = 'cog'.

Supported geotiff$compression values:

  • 'none'
  • 'off'
  • 'uncompressed'
  • 'deflate'
  • 'zip'
  • 'lzw'
  • 'packbits'
  • 'pack_bits'
  • 'jpeg'
  • 'webp'
  • 'web_p'
  • 'jpegxl'
  • 'jpeg_xl'
  • 'jxl'

These are accepted aliases for the same underlying codecs:

  • 'none', 'off', and 'uncompressed'
  • 'deflate' and 'zip'
  • 'packbits' and 'pack_bits'
  • 'webp' and 'web_p'
  • 'jpegxl', 'jpeg_xl', and 'jxl'

Supported geotiff$layout values:

  • 'standard': default GeoTIFF writer behavior.
  • 'stripped': strip-organized GeoTIFF.
  • 'striped': alias for 'stripped'.
  • 'tiled': tiled GeoTIFF.
  • 'cog': Cloud-Optimized GeoTIFF.

Layout-specific parameter behavior:

  • layout = 'standard': ignores strip/tile size keys.
  • layout = 'stripped' or 'striped': uses rows_per_strip. Default is 1 if omitted.
  • layout = 'tiled': uses tile_width and tile_height. tile_width defaults to 512 if omitted. tile_height defaults to tile_width. tile_size is accepted as a shortcut for both dimensions.
  • layout = 'cog': uses tile_size or cog_tile_size. Default is 512 if neither is supplied.

Extensionless Raster Outputs

If you omit the output extension, raster writes default to .tif. In that case the wrapper also defaults the layout to COG unless you explicitly specify another layout.

# Writes my_surface.tif and defaults to COG-style layout.
r$write('my_surface', overwrite = TRUE)

Practical Patterns

# Explicit uncompressed standard GeoTIFF
r$write(
  'out_standard_uncompressed.tif',
  overwrite = TRUE,
  options = list(
    compress = FALSE,
    geotiff = list(layout = 'standard')
  )
)

# Stripped GeoTIFF with 64 rows per strip
r$write(
  'out_stripped.tif',
  overwrite = TRUE,
  options = list(
    geotiff = list(layout = 'stripped', rows_per_strip = 64)
  )
)

# Tiled GeoTIFF using a single tile_size shortcut
r$write(
  'out_tiled.tif',
  overwrite = TRUE,
  options = list(
    geotiff = list(layout = 'tiled', tile_size = 256)
  )
)

# COG with explicit codec and BigTIFF toggle
r$write(
  'out_cog.tif',
  overwrite = TRUE,
  options = list(
    strict_format_options = TRUE,
    geotiff = list(
      layout = 'cog',
      tile_size = 512,
      compression = 'deflate',
      bigtiff = FALSE
    )
  )
)

Memory Lifecycle and Cleanup

For workflows that use memory-backed rasters, vectors, and lidar objects (file_mode = "m"), explicit lifecycle management prevents unbounded memory growth in long-running jobs.

When to use memory mode

Memory mode is most valuable when:

  • Chaining multiple operations on the same data without disk I/O.
  • Processing intermediate results that are never persisted to disk.
  • Running batched analysis where you load, process, and clear per batch.
  • Working with smaller datasets where memory is not a constraint.

Avoid memory mode when:

  • Working with data larger than available RAM.
  • Processing single operations on large files.
  • Running unattended long-running jobs without explicit cleanup.

Explicit cleanup in long-running pipelines

library(whiteboxworkflows)

# Long-running batch analysis
for (tile_id in 1:1000) {
  cat('Processing tile', tile_id, '\n')
  
  # Read data into memory for this tile
  r <- wbw_read_raster(sprintf('tile_%d.tif', tile_id), file_mode = "m")
  v <- wbw_read_vector(sprintf('bounds_%d.gpkg', tile_id), file_mode = "m")
  
  # Process
  result <- wbw_clip_raster_by_polygon(input = r$file_path(), polygon = v$file_path(), 
                output = sprintf('clipped_%d.tif', tile_id))
  
  # Explicit cleanup before next iteration
  wbw_remove_raster_from_memory(r)
  wbw_remove_vector_from_memory(v)
}

Monitoring memory usage

For production scripts, track memory explicitly:

library(whiteboxworkflows)

cat('Initial raster memory:', wbw_raster_memory_bytes() / 1e6, 'MB\n')
cat('Initial vector memory:', wbw_vector_memory_bytes() / 1e6, 'MB\n')

# ... run operations ...

# Before returning or starting new phase
cat('Final raster count:', wbw_raster_memory_count(), '\n')
cat('Final raster memory:', wbw_raster_memory_bytes() / 1e6, 'MB\n')

# Explicit reset if needed
wbw_clear_memory()
cat('After clear:', wbw_raster_memory_count(), '\n')

Lidar Output Controls

Lidar objects expose write(...) and deep_copy(...) with optional options. The standalone wbw_write_lidar() function persists a wbw_lidar result returned by a tool call directly to disk.

When a session method is called without an output argument the result is stored in memory automatically; pass it to wbw_write_lidar() to persist it:

library(whiteboxworkflows)

wbe <- wbw_make_session()
survey <- wbw_read_lidar('survey.laz')

# Omit output — result is memory-backed automatically
filtered <- wbe$filter_lidar_classes(input = survey$path, excluded_classes = list(7L))
cat(filtered$path, '\n')  # memory://lidar/...

# Persist when ready
wbw_write_lidar(filtered, 'survey_clean.copc.laz')

Write options for wbw_write_lidar() and l$write():

Use these options to tune archive size, cloud-read behavior, and downstream compatibility.

library(whiteboxworkflows)

l <- wbw_read_lidar('survey.las')

# LAZ controls
l$write(
  'survey_out.laz',
  overwrite = TRUE,
  options = list(
    laz = list(
      chunk_size = 25000,
      compression_level = 7
    )
  )
)

# COPC controls
l$write(
  'survey_out.copc.laz',
  overwrite = TRUE,
  options = list(
    copc = list(
      max_points_per_node = 75000,
      max_depth = 8,
      node_point_ordering = 'hilbert'
    )
  )
)

Lidar Write Option Reference

For lidar writes, the options list supports these top-level keys:

  • laz: nested list with LAZ writer controls.
  • copc: nested list with COPC hierarchy controls.

laz

Supported keys:

  • chunk_size: positive integer (> 0).
  • compression_level: integer in the range 0 to 9.

Default values when keys are omitted:

  • chunk_size: 50000.
  • compression_level: 6.

copc

Supported keys:

  • max_points_per_node: positive integer (> 0).
  • max_depth: positive integer (> 0).
  • node_point_ordering: one of:
    • 'auto'
    • 'morton'
    • 'hilbert'

Default values when keys are omitted:

  • max_points_per_node: 100000.
  • max_depth: 8.
  • node_point_ordering: 'auto'.

Notes:

  • If no output extension is provided, lidar writes default to .copc.laz.
  • COPC options are relevant when writing COPC output (.copc.laz / .copc.las).
# Extensionless output defaults to COPC
l$write(
  'survey_out',
  overwrite = TRUE,
  options = list(
    copc = list(
      max_points_per_node = 75000,
      max_depth = 8,
      node_point_ordering = 'hilbert'
    )
  )
)

# Explicit LAZ controls
l$write(
  'survey_out.laz',
  overwrite = TRUE,
  options = list(
    laz = list(
      chunk_size = 25000,
      compression_level = 7
    )
  )
)

Vector Output Controls

wbw_write_vector(...) persists a wbw_vector object to disk. When a session method is called without an output argument, the result is automatically stored in memory and can be written to disk with wbw_write_vector().

library(whiteboxworkflows)

wbe <- wbw_make_session()
roads <- wbw_read_vector('roads.gpkg')

# Omit output — result is memory-backed automatically
centroids <- wbe$centroid_vector(input = roads$path)
cat(centroids$path, '\n')  # memory://vector/...

# Persist when ready
wbw_write_vector(centroids, 'roads_centroids.gpkg')

# Or supply output explicitly to write directly to disk
wbe$centroid_vector(input = roads$path, output = 'roads_centroids_direct.gpkg')

Vector Write Option Reference

For vector writes, the options list supports these keys:

  • strict_format_options: logical validation switch.
  • geoparquet: nested list with GeoParquet writer controls.

strict_format_options

  • FALSE (default): format-specific controls are ignored when they do not apply to the selected output format.
  • TRUE: format-specific controls on incompatible output formats raise an error.

geoparquet

Supported keys:

  • max_rows_per_group: positive integer.
  • data_page_size_limit: positive integer.
  • write_batch_size: positive integer.
  • data_page_row_count_limit: positive integer.
  • compression: character scalar.

Default values when keys are omitted:

  • max_rows_per_group: 1_048_576.
  • data_page_size_limit: Parquet library default page size.
  • write_batch_size: Parquet library default write batch size.
  • data_page_row_count_limit: Parquet library default row-count limit.
  • compression: Parquet library default compression codec.

Supported geoparquet$compression values:

  • 'none'
  • 'snappy'
  • 'gzip'
  • 'lz4'
  • 'zstd'
  • 'brotli'

Notes:

  • GeoParquet controls are only applied for .parquet outputs.
  • If no output extension is provided, vector writes default to .gpkg.
wbw_write_vector(
  buffered,
  'roads.parquet',
  options = list(
    strict_format_options = TRUE,
    geoparquet = list(
      compression = 'zstd',
      max_rows_per_group = 250000,
      data_page_size_limit = 1048576,
      write_batch_size = 8192,
      data_page_row_count_limit = 20000
    )
  )
)

Reproducibility Checklist

  1. Pin output extension explicitly.
  2. Capture option list values in scripts.
  3. Verify metadata after write (metadata() and CRS values).
  4. Keep source files immutable; write derived outputs separately.

Supported Data Formats

This chapter summarizes practical format support exposed through WbW-R.

Choosing formats is a workflow architecture decision, not just a file-extension choice. You should evaluate read/write support together with ecosystem fit, compression behavior, and long-term maintainability of outputs shared across toolchains.

Backend support comes from core crates:

  • Raster: wbraster
  • Vector: wbvector
  • Lidar: wblidar

Raster Formats

Exhaustive raster format support in the current WbW-R build:

FormatRead (wbw_read_raster)Write (wbw_write_raster)Common extensions / path rules
DTEDYesYes.dt0, .dt1, .dt2 (DTED 0, 1, 2 elevation data; WGS-84 geographic only)
Esri ASCII GridYesYes.asc (and .grd when detected as ASCII)
Esri Binary Grid workspaceYesBackend-onlyEsri Binary workspace directory (hdr.adf + w001001.adf) or .adf
Esri Float GridYesYes.flt, .hdr (single-band float grid with header file)
ERDAS IMAGINE (HFA)YesNo.img - read-only MVP; RLC (run-length) compression supported
GRASS ASCII RasterYesYes.txt / .asc when GRASS header keys are detected
Surfer GRDYesYes.grd (DSAA / DSRB signatures)
PCRasterYesYes.map (CSF signature)
SAGA Binary GridYesYes.sdat, .sgrd
Idrisi / TerrSet RasterYesYes.rst, .rdc
ER MapperYesYes.ers
ENVI HDR-labelled rasterYesYes.hdr, or data files (.img, .dat, .bin, .raw, .bil, .bsq, .bip) with .hdr sidecar
GeoTIFF / BigTIFF / COGYesYes.tif, .tiff
GeoPackage rasterYesYes.gpkg
JPEG2000 / GeoJP2YesYes.jp2
JPEG + World FileYesYes.jpg, .jpeg with .jgw, .jpgw, .jpegw, or .wld world file
PNG + World FileYesYes.png with .pgw, .pngw, or .wld world file
ZarrYesYes.zarr store (directory / suffix)
XYZ ASCII GridYesYes.xyz (whitespace or comma-delimited X Y Z points)

Typical pattern:

library(whiteboxworkflows)

r <- wbw_read_raster('dem.tif')
wbw_write_raster(r, 'dem_out.tif')

Vector Formats

Exhaustive vector format support in the current WbW-R build:

FormatRead (wbw_read_vector)Write (wbw_write_vector)Extensions / notes
FlatGeobufYesYes.fgb
GeoJSONYesYes.geojson, .json
TopoJSONYesYes.topojson
GeoPackageYesYes.gpkg
GeoParquetYesYes.parquet
GMLYesYes.gml
GPXYesYes.gpx
KMLYesYes.kml
KMZYesYes.kmz
MapInfo InterchangeYesYes.mif with .mid sidecar
OSM PBFYesNo.osm.pbf (read workflows only)
ESRI ShapefileYesYes.shp plus dataset sidecars

When wbw_write_vector(...) is called without an extension, WbW-R defaults output to .gpkg.

Typical pattern:

library(whiteboxworkflows)

v <- wbw_read_vector('roads.gpkg')
wbw_write_vector(v, 'roads_out.gpkg')

Lidar Formats

Exhaustive lidar format support in the current WbW-R build:

FormatRead (wbw_read_lidar)Write (wbw_write_lidar)Extensions / notes
LASYesYes.las
LAZYesYes.laz
COPCYesYes.copc.las, .copc.laz
PLYYesYes.ply
E57YesYes.e57

When wbw_write_lidar(...) is called without an extension, WbW-R defaults output to .copc.laz.

Typical pattern:

library(whiteboxworkflows)

l <- wbw_read_lidar('survey.las')
wbw_write_lidar(l, 'survey_out.copc.laz')

Sensor Bundle Families

Supported bundle readers include:

  • wbw_read_bundle(...) (auto-detect)
  • wbw_read_landsat(...)
  • wbw_read_sentinel1(...)
  • wbw_read_sentinel2(...)
  • wbw_read_planetscope(...)
  • wbw_read_iceye(...)
  • wbw_read_dimap(...)
  • wbw_read_maxar_worldview(...)
  • wbw_read_radarsat2(...)
  • wbw_read_rcm(...)

Bundle inputs may be either extracted directories or supported archives:

  • .zip
  • .tar
  • .tar.gz
  • .tgz

See Working with Sensor Bundles for family-specific examples.

Validation Guidance

  1. Prefer stable interchange formats (.tif, .gpkg, .copc.laz) for production pipelines.
  2. Re-open outputs and verify metadata after write operations.
  3. Use explicit options where format behavior must be reproducible.

Reprojection and CRS

This chapter covers reprojection and CRS validation workflows in WbW-R.

CRS operations are correctness-critical. A workflow can run successfully while still producing invalid spatial interpretation if source and destination reference systems are misunderstood. The patterns here prioritize explicit CRS inspection, controlled reprojection calls, and immediate validation of outputs.

Inspect CRS

Run this check before any reprojection to catch missing or incorrect source CRS assumptions.

library(whiteboxworkflows)

r <- wbw_read_raster('dem.tif')
v <- wbw_read_vector('roads.gpkg')
l <- wbw_read_lidar('survey.las')

print(r$crs_epsg())
print(v$crs_epsg())
print(l$crs_epsg())

Assigning Projection Metadata

Projection assignment and reprojection are different operations:

  • Assignment updates CRS metadata only.
  • Reprojection changes coordinate values.

In current WbW-R, direct object-level CRS assignment helpers are not exposed in the same way as WbW-Py object methods. The practical assignment pattern is to use R spatial libraries for metadata repair, then re-open data in WbW-R.

library(whiteboxworkflows)
library(terra)
library(sf)

# Raster CRS assignment (metadata only)
r <- rast('dem_without_crs.tif')
crs(r) <- 'EPSG:26917'
writeRaster(r, 'dem_with_crs.tif', overwrite = TRUE)

# Vector CRS assignment (metadata only)
v <- st_read('roads_without_crs.gpkg', quiet = TRUE)
v <- st_set_crs(v, 26917)
st_write(v, 'roads_with_crs.gpkg', delete_dsn = TRUE, quiet = TRUE)

# Re-open in WbW-R for downstream analysis
r_wbw <- wbw_read_raster('dem_with_crs.tif')
v_wbw <- wbw_read_vector('roads_with_crs.gpkg')
print(r_wbw$crs_epsg())
print(v_wbw$crs_epsg())

If you need coordinate changes, use reprojection workflows in the next sections rather than metadata assignment.

Raster Reprojection Pattern

The core raster API provides six reprojection method patterns (documented here so behavior is explicit across language bindings):

  1. Full-options reprojection (reproject)
  2. Nearest convenience (reproject_nearest)
  3. Bilinear convenience (reproject_bilinear)
  4. Reproject to match another raster grid (reproject_to_match_grid)
  5. Reproject to match another raster resolution (reproject_to_match_resolution)
  6. Reproject to target EPSG while matching a reference resolution (reproject_to_match_resolution_in_epsg)

In WbW-R, practical workflows typically call reprojection tools through wbw_<tool>(...) wrappers (or wbw_run_tool(...) for dynamic tool-id workflows) and then reopen outputs as typed objects.

Available resampling methods (wbraster)

Use these method strings in reprojection workflows:

  • nearest
  • bilinear
  • cubic
  • lanczos
  • average
  • min
  • max
  • mode
  • median
  • stddev

Method guidance:

  • Categorical/class rasters: nearest (or mode where majority behavior is desired).
  • Continuous rasters: bilinear, cubic, or lanczos.
  • Statistical downscaling/generalization: average, min, max, median, stddev.

Example: explicit raster reprojection

library(whiteboxworkflows)

s <- wbw_session()
wbw_reproject_raster(input = 'dem.tif',
    output = 'dem_utm.tif',
    epsg = 32618,
    method = 'bilinear')

dem_utm <- wbw_read_raster('dem_utm.tif')
print(dem_utm$crs_epsg())

Example: match-grid categorical reprojection

library(whiteboxworkflows)

s <- wbw_session()
wbw_reproject_raster(input = 'landcover_4326.tif',
    output = 'landcover_utm_aligned.tif',
    epsg = 32618,
    method = 'nearest')

Automatic reprojection in raster-stack tools

Stack-based tools now support automatic alignment controls:

  • auto_reproject (default true)
  • auto_reproject_method (optional override)

Behavior for raster stacks:

  1. inputs[0]/input_rasters[0] is the reference raster.
  2. CRS-mismatched stack members are auto-reprojected to the reference grid when auto_reproject=true.
  3. If auto_reproject_method is not set:
    • categorical rasters infer nearest
    • continuous rasters infer bilinear
  4. Non-overlapping extents are treated as hard validation errors.

This matters most for tools that combine raster stacks (overlay, weighted sum, PCA, inverse PCA, raster calculator, segmentation).

library(whiteboxworkflows)

s <- wbw_session()
wbw_weighted_sum(input_rasters = c('slope_utm.tif', 'landcover_4326.tif', 'distance_utm.tif'),
    weights = c(0.4, 0.35, 0.25),
    auto_reproject = TRUE,
    auto_reproject_method = '',
    output = 'weighted_sum.tif')

Vector Reprojection Pattern

Use this when downstream geometry processing depends on a specific projected CRS.

library(whiteboxworkflows)

s <- wbw_session()
wbw_reproject_vector(input = 'roads.gpkg', output = 'roads_utm.gpkg', epsg = 32618)

roads_utm <- wbw_read_vector('roads_utm.gpkg')
print(roads_utm$crs_epsg())

Lidar Reprojection Pattern

Use this when point-cloud alignment and metric operations require a target CRS.

library(whiteboxworkflows)

s <- wbw_session()
wbw_reproject_lidar(input = 'survey.las', output = 'survey_utm.laz', epsg = 32618)

survey_utm <- wbw_read_lidar('survey_utm.laz')
print(survey_utm$crs_epsg())

Georeference Raster from Control Points

Use this when a raster/image lacks reliable georeferencing and you have control points mapping pixel coordinates to map coordinates.

Required CSV fields:

  • source_col
  • source_row
  • target_x
  • target_y
library(whiteboxworkflows)

s <- wbw_session()
wbw_georeference_raster_from_control_points(input = 'historical_scan.tif',
    control_points = 'historical_scan_gcps.csv',
    epsg = 32618,
    resample = 'bilinear',
    output = 'historical_scan_georef.tif',
    report = 'historical_scan_georef_report.json')

Projection Utility Functions

These functions provide CRS diagnostics and point-level coordinate transforms outside of full dataset reprojection workflows.

WKT and EPSG identification

library(whiteboxworkflows)

# Convert EPSG to OGC WKT.
wkt <- wbw_projection_to_ogc_wkt(32618)
cat(wkt, '\n')

# Identify EPSG from WKT or CRS text.
epsg <- wbw_projection_identify_epsg(wkt)
print(epsg)  # 32618

# Reproject a batch of points.
pts <- data.frame(x = -79.3832, y = 43.6532)
result <- wbw_projection_reproject_points(pts, src_epsg = 4326L, dst_epsg = 32618L)
print(result)

Parse a PROJ string

Use wbw_projection_from_proj_string when you have a PROJ4-style string from a legacy file header or third-party metadata source and need the corresponding EPSG code or OGC WKT.

The function returns a named list with exactly one element:

  • list(epsg = integer) — EPSG code identified
  • list(wkt = character) — no EPSG match, WKT representation available
  • list(unknown = TRUE) — PROJ string parsed but CRS could not be resolved
library(whiteboxworkflows)

proj_str <- '+proj=utm +zone=17 +datum=NAD83 +units=m +no_defs'
result <- wbw_projection_from_proj_string(proj_str)

if (!is.null(result$epsg)) {
  cat('Identified EPSG:', result$epsg, '\n')  # e.g. 26917
} else if (!is.null(result$wkt)) {
  cat('WKT:', result$wkt, '\n')
} else {
  cat('CRS unknown\n')
}

This is the recommended approach for legacy rasters or vectors whose metadata carries only a PROJ4 string. WbW-R uses this path internally in wbw_projection_identify_epsg as the third fallback step.

Area-of-use bounding box

Use wbw_projection_area_of_use to retrieve the geographic domain of valid use for an EPSG code. This is useful for validating that data falls within the CRS before or after reprojection.

Returns a named list list(lon_min, lat_min, lon_max, lat_max), or NULL if no bounding box is registered for the code.

library(whiteboxworkflows)

bbox <- wbw_projection_area_of_use(32618)  # UTM Zone 18N
if (!is.null(bbox)) {
  cat(sprintf('valid lon: %.1f to %.1f\n', bbox$lon_min, bbox$lon_max))
  cat(sprintf('valid lat: %.1f to %.1f\n', bbox$lat_min, bbox$lat_max))
}

# Returns NULL for unregistered codes.
print(wbw_projection_area_of_use(9999))  # NULL

Best Practices

  • Confirm source CRS before transformation.
  • Use interpolation appropriate to data type for raster reprojection.
  • Re-open outputs and verify CRS metadata.
  • Keep transform arguments explicit in reproducible scripts.

Topology Utilities

WbW-R currently emphasizes topology through tool workflows (category-driven) rather than a dedicated R-native topology utility namespace equivalent to Python.

At the same time, session-level topology helpers provide a fast path for geometry QA checks and validation gates before heavier vector operations.

Topology Workflow Pattern

A robust topology-first script pattern is:

  1. Validate CRS compatibility first.
  2. Run fast predicate checks to detect obvious incompatibilities.
  3. Repair invalid polygon geometries if needed.
  4. Re-run critical predicates after repair.
  5. Continue into heavier vector processing only after checks pass.

This keeps failures early, localized, and easier to diagnose.

Session-Level Topology Predicate Checks

Use these methods when you need script-level yes/no checks on WKT geometries.

library(whiteboxworkflows)

s <- wbw_session()

poly <- 'POLYGON((0 0,10 0,10 10,0 10,0 0))'
pt <- 'POINT(5 5)'

print(s$topology_contains_wkt(poly, pt))
print(s$topology_intersects_wkt(poly, pt))
print(s$topology_within_wkt(pt, poly))
print(s$topology_touches_wkt(poly, pt))
print(s$topology_disjoint_wkt(poly, pt))

Topology via Tool Workflows

Use session execution and topology/geometry tool IDs through category discovery.

library(whiteboxworkflows)

s <- wbw_session()

# Discover likely topology-related tools
hits <- wbw_search_tools('topology')
print(hits)

# Inspect a specific candidate tool
if (length(hits) > 0) {
  first_hit <- hits[[1]]
  tool_id <- if (is.list(first_hit) && !is.null(first_hit$tool_id)) first_hit$tool_id else as.character(first_hit)
  print(wbw_describe_tool(tool_id))
}

Use this pattern when topology checks are part of larger, file-based vector pipelines that should remain in Whitebox tooling end to end.

Vector Topology Pattern

library(whiteboxworkflows)

s <- wbw_session()

# Example pattern: run a topology-oriented vector tool once selected.
# Replace 'tool_id_here' and args with the concrete tool from discovery.
# wbw_run_tool(
#   'tool_id_here',
#   args = list(input = 'input.gpkg', output = 'output.gpkg'),
#   session = s
# )

Use this template after selecting a concrete tool ID through discovery.

Geometry Validation and Repair

Use session helpers for pure WKT checks and repairs.

library(whiteboxworkflows)

s <- wbw_session()

invalid <- 'POLYGON((0 0,4 4,4 0,0 4,0 0))'
print(s$topology_is_valid_polygon_wkt(invalid))

fixed <- s$topology_make_valid_polygon_wkt(invalid)
print(fixed)

buf <- s$topology_buffer_wkt('LINESTRING(0 0, 10 0)', 1.5)
print(buf)

For broader in-memory feature editing, an sf interop path is still practical:

library(whiteboxworkflows)
library(sf)

v <- wbw_read_vector('polygons.gpkg')
g <- v$to_sf()

valid_flags <- st_is_valid(g)
print(table(valid_flags))

g_fixed <- st_make_valid(g)
st_write(g_fixed, 'polygons_valid.gpkg', delete_dsn = TRUE, quiet = TRUE)

Relationship and Distance

Use relation/distance methods when binary predicates are not expressive enough.

library(whiteboxworkflows)

s <- wbw_session()

a <- 'LINESTRING(0 0, 10 0)'
b <- 'LINESTRING(0 1, 10 1)'

print(s$topology_distance_wkt(a, b))
print(s$topology_relate_wkt(a, b))

Topology Method Reference

Session MethodDescription
topology_intersects_wktReturn TRUE when two geometries share any space.
topology_contains_wktReturn TRUE when geometry A strictly contains geometry B.
topology_within_wktReturn TRUE when geometry A is strictly within geometry B.
topology_touches_wktReturn TRUE when geometries meet at boundaries only.
topology_disjoint_wktReturn TRUE when geometries do not intersect.
topology_crosses_wktReturn TRUE when geometries cross with dimensional reduction behavior.
topology_overlaps_wktReturn TRUE when same-dimension geometries partially overlap.
topology_covers_wktBoundary-aware containment test (A covers B).
topology_covered_by_wktBoundary-aware containment test (A covered by B).
topology_relate_wktReturn DE-9IM relationship text for exact topology-rule evaluation.
topology_distance_wktReturn shortest distance between geometries.
topology_vector_feature_relationEvaluate relation between indexed features in vector objects.
topology_is_valid_polygon_wktCheck polygon validity before topology-sensitive workflows.
topology_make_valid_polygon_wktRepair an invalid polygon WKT representation.
topology_buffer_wktCreate a buffered geometry from WKT and distance.

Guidance

  • Use wbw_search_tools(...) + wbw_describe_tool(...) to select backend topology tools.
  • Use session topology helpers for fast script-level predicates and repairs.
  • Use sf for broader in-memory feature editing when needed.
  • Re-open outputs with wbw_read_vector(...) and verify schema/CRS after topology operations.

Interoperability

This chapter provides practical roundtrip patterns between WbW-R and R spatial tooling.

Interoperability should be treated as deliberate boundary crossing. Each conversion step can affect metadata fidelity, CRS representation, numeric precision, and schema details. The examples here emphasize explicit handoff points and validation checkpoints so multi-package workflows remain defensible.

Copy-Boundary Model

  • Array exchange via to_array() and wbw_array_to_raster(...) is an explicit in-memory boundary.
  • terra/stars/sf workflows are typically file or object conversion boundaries.
  • Always validate metadata after roundtrip.

terra Roundtrip

Use this when your raster workflow depends on terra-native functions.

library(whiteboxworkflows)
library(terra)

r <- wbw_read_raster('dem.tif')
terra_r <- r$to_terra()

# terra-side operation
terra_r2 <- terra::focal(terra_r, w = 3, fun = mean, na.rm = TRUE)
terra::writeRaster(terra_r2, 'dem_terra_smoothed.tif', overwrite = TRUE)

r_back <- wbw_read_raster('dem_terra_smoothed.tif')
print(r_back$metadata())

stars Roundtrip

Use this for labeled-array operations and grid math in stars workflows.

library(whiteboxworkflows)
library(stars)

r <- wbw_read_raster('dem.tif')
st <- r$to_stars()

# stars-side operation
st2 <- st * 1.05
st_as_stars <- st_as_stars(st2)
write_stars(st_as_stars, 'dem_stars_scaled.tif', driver = 'GTiff')

r_back <- wbw_read_raster('dem_stars_scaled.tif')
print(r_back$metadata())

sf Roundtrip (Vector)

Use this when vector editing and validity checks are more convenient in sf.

library(whiteboxworkflows)
library(sf)

v <- wbw_read_vector('roads.gpkg')
sf_obj <- v$to_sf()

# sf-side edit
sf_obj$len_m <- as.numeric(st_length(sf_obj))
sf_obj <- sf_obj[sf_obj$len_m > 20, ]
st_write(sf_obj, 'roads_sf_filtered.gpkg', delete_dsn = TRUE, quiet = TRUE)

v_back <- wbw_read_vector('roads_sf_filtered.gpkg')
print(v_back$schema())

Validation Checklist

  1. Re-check CRS and bounds after conversion.
  2. Re-check schema and representative attributes for vector flows.
  3. Prefer stable interchange formats (.tif, .gpkg) for routine roundtrips.

Licensing

This chapter documents open, signed entitlement, and floating license startup patterns in WbW-R.

Licensing mode is an operational dependency that should be explicit in scripts. The objective is to make startup behavior predictable and auditable: define which mode is expected, how fallback is handled, and what diagnostics are captured when activation fails.

Open Mode

Use this mode for OSS-only workflows or environments without entitlement needs.

library(whiteboxworkflows)

s <- wbw_session()
print(s)

Signed Entitlement Mode

Use this for managed deployments with entitlement payload verification.

library(whiteboxworkflows)

signed_entitlement_json <- '...'

s <- wbw_session(
  signed_entitlement_json = signed_entitlement_json,
  public_key_kid = 'k1',
  public_key_b64url = 'REPLACE_WITH_PROVIDER_KEY',
  include_pro = TRUE,
  tier = 'open'
)

print(s)

Floating License Mode

Use this for centrally managed license allocation across users or machines.

library(whiteboxworkflows)

s <- wbw_session(
  floating_license_id = 'fl_12345',
  include_pro = TRUE,
  tier = 'open',
  provider_url = 'https://license.example.com',
  machine_id = 'machine-01',
  customer_id = 'customer-abc'
)

print(s)

Failure Handling Guidance

  • Validate session creation at script startup and fail early.
  • Capture and log runtime startup errors for entitlement/floating modes.
  • Use open mode fallback only when policy allows.

Security and Operations Notes

  • Keep entitlement payloads and keys out of source control.
  • Prefer environment-variable or secret-store injection.
  • Record runtime version and startup mode for reproducibility.

API Reference Strategy

Manual chapters provide narrative and recipes.

This separation between narrative guidance and parameter-complete references is intentional. The manual answers workflow and design questions, while shared tool docs provide exhaustive argument contracts. Keeping these concerns separate makes the manual easier to read without reducing technical precision.

Detailed tool-level parameter references remain in shared tool docs.

Reference Boundaries

  • This manual focuses on concepts, workflow patterns, and runnable examples.
  • Tool parameter completeness remains in shared tool documentation.
  • Facade-level usage details are emphasized in object workflow chapters.

Terrain Analysis and Geomorphometry

Digital terrain analysis in WbW-R encompasses the full range of operations from surface derivatives (slope, aspect, curvature) through multi-scale geomorphometric indices and geomorphological classification. All computation is performed by the Whitebox backend; this chapter shows how to wire those tools into reproducible R workflows.


Core Concepts

Terrain analysis relies on these fundamental concepts:

  • Slope: The gradient of the land surface, typically expressed in degrees (0–90) or percent. High values indicate steep terrain; low values indicate flat terrain. Critical for erosion, landslide, and hydrology models.
  • Aspect: Compass direction a slope faces (0–360°, measured clockwise from north). Flat cells are typically -1 or nodata. Controls solar radiation, drainage direction, and habitat quality.
  • Curvature: Rate of change of slope (usually planform and profile separately). Positive profile curvature indicates acceleration zones (ridges); negative indicates deceleration zones (valleys).
  • Flow direction: The steepest downslope direction from each cell. Forms the basis for flow accumulation, drainage network delineation, and watershed boundaries.
  • Flow accumulation: Number of upslope cells draining to each cell, proportional to contributing area. High values indicate stream channels and valley bottoms.
  • Topographic Wetness Index (TWI): Ln(upslope area / tan(slope)), predicts persistent moisture. High TWI indicates probable saturated zones; used in soil moisture and flood risk mapping.
  • Landform classification: Categorizing terrain into types (e.g. summits, ridges, valleys, footslopes) via multivariate analysis. Provides interpretable terrain structure without tuning thresholds.
  • Multiscale analysis: Deriving terrain metrics at multiple scales. Single-scale derivatives can miss important structure; multiscale approaches reveal process-relevant scales.
  • Viewshed: Set of cells visible from a vantage point. Used in landscape perception, military analysis, and wind farm siting.

Session Setup

library(whiteboxworkflows)

s <- wbw_session()

# Set working directory for relative file paths
setwd('/data/terrain')

Reading a DEM

dem <- wbw_read_raster('dem.tif')
meta <- dem$metadata()
cat('Rows:', meta$rows, '  Cols:', meta$columns, '\n')
cat('Cell size:', meta$resolution_x, 'm\n')
cat('NoData:', meta$nodata, '\n')

Surface Derivatives

Slope

# Slope in degrees
slope_deg <- wbw_slope(dem   = dem$file_path(),
  output = 'slope_deg.tif',
  units  = 'degrees',
  z_factor = 1.0)

slope <- wbw_read_raster('slope_deg.tif')

Aspect

wbw_aspect(dem    = dem$file_path(),
  output = 'aspect.tif',
  zero_aspect = FALSE)

Hillshade

wbw_hillshade(dem     = dem$file_path(),
  output  = 'hillshade.tif',
  azimuth = 315.0,
  altitude = 30.0)

Profile Curvature, Plan Curvature, Tangential Curvature

wbw_profile_curvature(dem = dem$file_path(), output = 'profc.tif')
wbw_plan_curvature(dem = dem$file_path(), output = 'planc.tif')
wbw_tangential_curvature(dem = dem$file_path(), output = 'tangc.tif')

Mean Curvature and Other Modes

wbw_mean_curvature(dem = dem$file_path(), output = 'meanc.tif')
wbw_minimal_curvature(dem = dem$file_path(), output = 'minc.tif')
wbw_maximal_curvature(dem = dem$file_path(), output = 'maxc.tif')
wbw_gaussian_curvature(dem = dem$file_path(), output = 'gaussc.tif')

Topographic Wetness and Flow Accumulation

wbw_wetness_index(sca    = 'sca.tif',
  slope  = 'slope_deg.tif',
  output = 'twi.tif')

Computing the specific contributing area (SCA) first:

wbw_d_inf_flow_accum(dem     = dem$file_path(),
  output  = 'sca.tif',
  out_type = 'sca',
  threshold = 0.0,
  log      = FALSE,
  clip     = FALSE)

Relief and Position Indices

Relative Topographic Position

wbw_relative_topographic_position(dem    = dem$file_path(),
  output = 'rtp.tif',
  filterx = 101,
  filtery = 101)

TPI, Deviation from Mean, Scale Standardised Elevation

wbw_topographic_position_index(dem    = dem$file_path(),
  output = 'tpi.tif',
  minrad = 1.0,
  maxrad = 25.0,
  steps  = 10,
  num_sig_digits = 3)

wbw_deviation_from_mean_elevation(dem    = dem$file_path(),
  output = 'dev_mean.tif',
  filterx = 11,
  filtery = 11)

wbw_elev_above_pit(dem    = dem$file_path(),
  output = 'elev_above_pit.tif')

Multi-Scale Roughness and Complexity

wbw_multiscale_roughness(dem    = dem$file_path(),
  out_mag  = 'ms_rough_mag.tif',
  out_scale = 'ms_rough_scale.tif',
  min_scale = 1,
  max_scale = 100,
  step = 1)

wbw_vector_ruggedness_measure(dem    = dem$file_path(),
  output = 'vrm.tif',
  filterx = 11,
  filtery = 11)

wbw_ruggedness_index(dem    = dem$file_path(),
  output = 'tri.tif')

Terrain Smoothing

DEM-derived curvature, landform classes, and terrain-position indices can be overly sensitive to short-range roughness. Whitebox Next Gen now includes a multiscale feature-preserving smoother that is better aligned with modern terrain-analysis workflows than relying only on the older single-scale tool.

wbw_feature_preserving_smoothing_multiscale(input = dem$file_path(),
  output = 'dem_smooth_multiscale.tif',
  smoothing_amount = 0.65,
  edge_preservation = 0.80,
  scale_levels = 4,
  fidelity = 0.45,
  z_factor = 1.0)

dem_smooth <- wbw_read_raster('dem_smooth_multiscale.tif')

The key idea is coarse-to-fine smoothing: larger-scale terrain form is stabilized first, then finer detail is reintroduced while preserving major breaks-in-slope. This is especially useful before curvature, geomorphon, and terrain-position workflows.


Geomorphons — Landform Classification

wbw_geomorphons(dem              = dem$file_path(),
  output           = 'geomorphons.tif',
  search           = 50,
  threshold        = 1.0,
  fdist            = 0,
  skip             = 0,
  forms            = TRUE,
  residuals        = FALSE)

Geomorphon class codes: 1=flat, 2=peak, 3=ridge, 4=shoulder, 5=spur, 6=slope, 7=hollow, 8=footslope, 9=valley, 10=pit.


Multi-Scale Topographic Position

wbw_multiscale_topographic_position_image(local  = 'tpi_local.tif',
  meso   = 'tpi_meso.tif',
  broad  = 'tpi_broad.tif',
  output = 'mstpi_rgb.tif',
  hillshade = 'hillshade.tif')

Solar and Horizon Analysis

wbw_time_in_daylight(dem        = dem$file_path(),
  output     = 'daylight_hrs.tif',
  lat        = 43.5,
  long       = -80.5,
  az_fraction = 10.0,
  max_dist   = 100.0,
  utc_offset  = '-5:00',
  start_day   = 172,
  end_day     = 172,
  start_time  = '06:00:00',
  end_time    = '20:00:00')

WbW-Pro Spotlight: Terrain Constraint and Conflict Analysis

  • Problem: Screen terrain constraints early for siting and corridor decisions.
  • Tool: terrain_constraint_and_conflict_analysis
  • Typical inputs: DEM, optional wetness, optional flood-risk surface, optional land-cover penalty, slope threshold.
  • Typical outputs: Terrain-conflict score raster, conflict classes, and summary outputs.
result <- s$terrain_constraint_and_conflict_analysis(
  dem               = 'dem.tif',
  wetness           = 'wetness_index_norm.tif',
  flood_risk        = 'flood_risk_norm.tif',
  landcover_penalty = 'landcover_penalty_norm.tif',
  slope_limit_deg   = 15.0,
  output_prefix     = 'terrain_conflict_corridor_a'
)

print(result)

Note: This workflow requires a session initialized with a valid Pro licence.


Complete Terrain Analysis Workflow

library(whiteboxworkflows)

s <- wbw_session()
setwd('/data/terrain_workflow')

dem <- wbw_read_raster('dem.tif')

# Smooth the DEM before derivative-heavy work.
wbw_feature_preserving_smoothing_multiscale(input = dem$file_path(),
  output = 'dem_smooth_multiscale.tif',
  smoothing_amount = 0.65,
  edge_preservation = 0.80,
  scale_levels = 4,
  fidelity = 0.45)

dem_smooth <- wbw_read_raster('dem_smooth_multiscale.tif')

# Derivatives
for (tool in c('slope', 'aspect')) {
  wbw_run_tool(tool, args = list(dem = dem_smooth$file_path(),
    output = paste0(tool, '.tif')), session = s)
}

wbw_profile_curvature(dem = dem_smooth$file_path(), output = 'profc.tif')
wbw_plan_curvature(dem = dem_smooth$file_path(), output = 'planc.tif')

# Hillshade for visualisation
wbw_hillshade(dem = dem_smooth$file_path(), output = 'hillshade.tif',
  azimuth = 315.0, altitude = 30.0)

# TWI
wbw_d_inf_flow_accum(dem = dem_smooth$file_path(), output = 'sca.tif',
  out_type = 'sca')
wbw_wetness_index(sca = 'sca.tif', slope = 'slope.tif',
  output = 'twi.tif')

# Roughness and classification
wbw_vector_ruggedness_measure(dem = dem_smooth$file_path(), output = 'vrm.tif', filterx = 11, filtery = 11)
wbw_geomorphons(dem = dem_smooth$file_path(), output = 'geomorphons.tif', search = 50, threshold = 1.0,
  fdist = 0, skip = 0, forms = TRUE, residuals = FALSE)

cat('Terrain analysis complete.\n')

Tips

  • Pre-process your DEM: Remove spikes, pits, and fill sinks before computing derivatives. Use breach_depressions_least_cost() (preserves real depressions) or fill_depressions() (infills) depending on your application.
  • Resolution matters: Coarse DEMs (e.g. 30 m) are smoothed and may miss local process features. Fine DEMs (e.g. 1 m LiDAR) can be noisy. Choose resolution to match your process scale.
  • Flow direction algorithms: D8 (8-directional) is faster but can cause artificial flow alignments. D-infinity and Dinf-Rho distribute flow more naturally and are preferred for continuous analyses.
  • Curvature is scale-dependent: Always compute curvature at the scale matching your DEM resolution. A 10 m window on a 1 m DEM can overinterpret noise; a 1 m window on a 30 m DEM misses important structure.
  • Multiscale position classification: Run classification at 3–5 scales (e.g. local, neighbourhood, regional) and examine layer coherence. Inconsistent multi-scale patterns suggest model overfitting.
  • Viewshed validation: Viewshed results are sensitive to DEM quality and observer height assumptions. Always validate against ground observation or high-res ortho imagery.
  • Hydrological thresholds are empirical: Contributing area thresholds for stream initiation vary by geology and climate (typically 0.5–5 km²). Calibrate against observed stream networks.
  • Openness and exposure: Sky-view factor (SVF) and terrain openness support better hillshading and visibility assessment than raw slope or aspect. Use for visual interpretation and photogrammetry.

Spatial Hydrology

Hydrological analysis in WbW-R covers the full DEM-to-drainage workflow: depression removal, flow routing, stream extraction, watershed delineation, and hydrological indices. All heavy computation runs in the Whitebox backend via wbw_<tool>(...) wrappers (or wbw_run_tool(...) for dynamic tool-id workflows); R handles orchestration, conditional logic, and result inspection.


Session Setup

library(whiteboxworkflows)

s <- wbw_session()
setwd('/data/hydrology')

dem <- wbw_read_raster('dem.tif')

Depression Removal

Raw DEMs often contain spurious pits that block modelled drainage. Choose the method appropriate to your landscape and accuracy requirements.

Breach Depressions — Least-Cost

The preferred pre-treatment for most applications. Carves the minimum-width, minimum-depth channel through each depression barrier:

wbw_breach_depressions_least_cost(dem          = dem$file_path(),
  output       = 'dem_breached.tif',
  dist         = 500,        # maximum breach distance (cells)
  max_cost     = -1.0,       # -1 = no cost limit
  min_dist     = TRUE,
  flat_increment = 0.0001,
  fill_deps    = FALSE)

Fill Depressions — Wang & Liu

Use fill after breach, or alone on smooth low-relief landscapes:

wbw_fill_depressions_wang_and_liu(dem       = dem$file_path(),
  output    = 'dem_filled.tif',
  fix_flats = TRUE,
  flat_increment = 0.001,
  max_depth = -1.0)

Burn-in Stream Network

If you have a mapped stream network, burn it into the DEM before filling to improve drainage coherence:

streams <- wbw_read_vector('streams.shp')
wbw_burn_streams_at_roads(dem     = dem$file_path(),
  streams = streams$file_path(),
  roads   = 'roads.shp',
  output  = 'dem_burn.tif',
  width   = 6.0)

Flow Routing

D8 Flow Pointer and Accumulation

wbw_d8_pointer(dem    = 'dem_breached.tif',
  output = 'd8_ptr.tif',
  esri_pntr = FALSE)

wbw_d8_flow_accum(i      = 'd8_ptr.tif',
  output = 'd8_acc.tif',
  out_type = 'cells',
  log    = FALSE,
  clip   = FALSE,
  pntr   = TRUE)

D-Infinity Routing

Distributes flow across two cells proportionally:

wbw_d_inf_pointer(dem    = 'dem_breached.tif',
  output = 'dinf_ptr.tif')

wbw_d_inf_flow_accum(dem      = 'dem_breached.tif',
  output   = 'dinf_sca.tif',
  out_type = 'sca',
  log      = FALSE)

FD8 Multi-Flow

wbw_fd8_flow_accum(dem      = 'dem_breached.tif',
  output   = 'fd8_acc.tif',
  out_type = 'cells',
  exponent = 1.1,
  threshold = 0.0,
  log      = FALSE,
  clip     = FALSE)

Stream Extraction

wbw_extract_streams(flow_accum = 'd8_acc.tif',
  output     = 'streams.tif',
  threshold  = 5000,
  zero_background = TRUE)

Extract Valley Bottom by Region Growing

wbw_extract_valley_bottoms(dem     = 'dem_breached.tif',
  output  = 'valley_bottoms.tif',
  threshold = 0.5,
  line_thin = TRUE)

Watershed Delineation

Snap Pour Points

outlets <- wbw_read_vector('gauges.shp')
wbw_snap_pour_points(pour_pts   = outlets$file_path(),
  flow_accum = 'd8_acc.tif',
  output     = 'gauges_snapped.shp',
  snap_dist  = 200.0)

Single and Multi-Watershed Delineation

wbw_watershed(d8_pntr    = 'd8_ptr.tif',
  pour_pts   = 'gauges_snapped.shp',
  output     = 'watersheds.tif',
  esri_pntr  = FALSE)

Unnest Basins

wbw_unnest_basins(d8_pntr   = 'd8_ptr.tif',
  pour_pts  = 'gauges_snapped.shp',
  output    = 'unnested_basins.tif',
  esri_pntr = FALSE)

Flow Path Analysis

# Downslope flowpath length
wbw_downslope_flowpath_length(d8_pntr  = 'd8_ptr.tif',
  output   = 'ds_flowpath_len.tif',
  watersheds = '',
  weights  = '',
  esri_pntr = FALSE)

# Distance to stream outlet
wbw_distance_to_outlet(d8_pntr = 'd8_ptr.tif',
  streams = 'streams.tif',
  output  = 'dist_to_outlet.tif',
  esri_pntr = FALSE,
  zero_background = TRUE)

# HAND — Height Above Nearest Drainage
wbw_elevation_above_stream(dem     = 'dem_breached.tif',
  streams = 'streams.tif',
  output  = 'hand.tif')

Hydrological Indices

Topographic Wetness Index

wbw_wetness_index(sca    = 'dinf_sca.tif',
  slope  = 'slope.tif',
  output = 'twi.tif')

Sediment Transport Index

wbw_sediment_transport_index(sca    = 'dinf_sca.tif',
  slope  = 'slope.tif',
  output = 'sti.tif',
  sca_exponent  = 0.4,
  slope_exponent = 1.3)

Stream Power Index

wbw_stream_power_index(sca    = 'dinf_sca.tif',
  slope  = 'slope.tif',
  output = 'spi.tif',
  exponent = 1.0)

Isobasins — Equal-Area Basin Partitioning

wbw_isobasins(dem          = 'dem_breached.tif',
  output       = 'isobasins.tif',
  size         = 5000,
  connections  = TRUE,
  csv_file     = 'isobasin_connectivity.csv')

Stochastic Depression Analysis

wbw_stochastic_depression_analysis(dem         = dem$file_path(),
  output      = 'prob_flooded.tif',
  rmse        = 0.18,
  range       = 20.0,
  iterations  = 1000)

Complete Hydrological Analysis Workflow

library(whiteboxworkflows)

s <- wbw_session()
setwd('/data/hydro_workflow')

dem <- wbw_read_raster('dem.tif')

# 1. Breach depressions
wbw_breach_depressions_least_cost(dem = dem$file_path(), output = 'dem_b.tif', dist = 500,
  flat_increment = 0.0001, fill_deps = FALSE)

# 2. D8 routing
wbw_d8_pointer(dem = 'dem_b.tif', output = 'd8_ptr.tif')
wbw_d8_flow_accum(i = 'd8_ptr.tif', output = 'd8_acc.tif',
  out_type = 'cells', pntr = TRUE)

# 3. DInf SCA for TWI
wbw_d_inf_flow_accum(dem = 'dem_b.tif', output = 'sca.tif',
  out_type = 'sca')

# 4. Slope for indices
wbw_slope(dem = 'dem_b.tif', output = 'slope.tif',
  units = 'degrees')

# 5. Streams
wbw_extract_streams(flow_accum = 'd8_acc.tif',
  output = 'streams.tif', threshold = 3000, zero_background = TRUE)

# 6. Watershed
outlets <- wbw_read_vector('gauges.shp')
wbw_snap_pour_points(pour_pts = outlets$file_path(),
  flow_accum = 'd8_acc.tif', output = 'gauges_snap.shp', snap_dist = 100.0)
wbw_watershed(d8_pntr = 'd8_ptr.tif',
  pour_pts = 'gauges_snap.shp', output = 'watersheds.tif')

# 7. Indices
wbw_wetness_index(sca = 'sca.tif', slope = 'slope.tif',
  output = 'twi.tif')
wbw_elevation_above_stream(dem = 'dem_b.tif',
  streams = 'streams.tif', output = 'hand.tif')

cat('Hydrological analysis complete.\n')

LiDAR Processing

LiDAR point cloud processing in WbW-R covers the full pipeline from raw flight-line data through to classified point clouds and derived raster products. All processing tools run in the Whitebox backend; R handles session management, file discovery, and result validation.


Core Concepts

Before processing LiDAR data, understand these foundational terms:

  • Return number: Which reflection from a single laser pulse. Pulse 1 is the first (often canopy top); pulse 2–5 capture midstory and ground returns.
  • Point classification: ASPRS standard categories — ground (2), low veg (3), medium veg (4), high veg (5), buildings (6), noise (7), overlap (12), and others.
  • Intensity: Reflectance value (0–65535) proportional to target brightness. Useful for vegetation density estimation and water detection.
  • Ground filtering: Separating terrain points (classification 2) from vegetation and buildings; critical for accurate digital terrain models (DTMs).
  • Digital terrain model (DTM): Raster surface of bare earth, computed from ground returns only. Used for hydrology, geomorphometry, and flood modelling.
  • Digital surface model (DSM): Raster surface of highest returns (canopy top). Used for building detection and volume calculations.
  • Canopy height model (CHM): DSM minus DTM; represents vegetation height above ground. Standard input for tree detection and segmentation.
  • Point density: Points per square unit (typically points/m²). Higher density enables finer segmentation; lower density requires smoothing.
  • Normalization: Converting raw Z-values to height-above-ground by subtracting DTM, creating a normalized point cloud for structural analysis.

Session Setup and File Discovery

library(whiteboxworkflows)

s <- wbw_session()
setwd('/data/lidar')

# List all LAS/LAZ files in the project folder
las_files <- list.files('.', pattern = '\\.(las|laz)$', full.names = TRUE,
                         recursive = TRUE)
cat('Found', length(las_files), 'point cloud files.\n')

Reading and Inspecting LiDAR Files

las <- wbw_read_lidar('survey.las')
meta <- las$metadata()

cat('Point count:', meta$number_of_points, '\n')
cat('X range:', meta$min_x, 'to', meta$max_x, '\n')
cat('Y range:', meta$min_y, 'to', meta$max_y, '\n')
cat('Z range:', meta$min_z, 'to', meta$max_z, '\n')
cat('Point density per m²:', meta$point_density, '\n')

Point Cloud Filtering and Outlier Removal

# Remove isolated low and high points
wbw_lidar_elevation_slice(i       = 'survey.las',
  output  = 'survey_sliced.las',
  minz    = 0.0,
  maxz    = 200.0,
  cls     = FALSE)

# Statistical outlier removal
wbw_lidar_remove_outliers(i       = 'survey.las',
  output  = 'survey_clean.las',
  radius  = 2.0,
  elev_diff = 25.0,
  use_median = FALSE)

Ground Point Classification

Progressive Morphological Filter

wbw_lidar_ground_point_filter(i        = 'survey_clean.las',
  output   = 'survey_classified.las',
  radius   = 2.0,
  min_elev_diff = 0.15,
  max_elev_diff = 1.3,
  num_iter = 5,
  threshold = 0.15,
  slope_threshold = 0.15,
  height_threshold = 1.0,
  classify  = TRUE,
  slope       = FALSE,
  height_diff = TRUE,
  filter_return_all = FALSE)

Digital Elevation Model Interpolation

Tin Gridding (TIN Interpolation)

wbw_tin_gridding(i          = 'survey_classified.las',
  output     = 'dtm.tif',
  returns    = 'last',
  resolution = 1.0,
  exclude_cls = '1,3,4,5,6,7',
  minz       = -50.0,
  maxz       = 250.0)

LiDAR IDW Interpolation

wbw_lidar_idw_interpolation(i          = 'survey_classified.las',
  output     = 'dtm_idw.tif',
  parameter  = 'elevation',
  returns    = 'last',
  resolution = 1.0,
  weight     = 1.0,
  radius     = 2.5,
  exclude_cls = '1,3,4,5,6,7')

Normalised Height Above Ground

wbw_normalize_lidar(i          = 'survey_classified.las',
  ground     = 'survey_classified.las',
  output     = 'survey_normalised.las',
  ignore_ground_distance = FALSE)

Canopy Height Model (CHM) and DSM

# First return DSM
wbw_lidar_idw_interpolation(i          = 'survey_classified.las',
  output     = 'dsm.tif',
  parameter  = 'elevation',
  returns    = 'first',
  resolution = 1.0,
  weight     = 1.0,
  radius     = 2.5,
  exclude_cls = '7')

# Canopy Height Model = DSM - DTM
wbw_subtract(input1 = 'dsm.tif',
  input2 = 'dtm.tif',
  output = 'chm.tif')

Point Density and Distribution Analysis

wbw_lidar_point_stats(i          = 'survey.las',
  resolution = 1.0,
  num_points = TRUE,
  num_pulses = FALSE)

wbw_lidar_density(i          = 'survey.las',
  output     = 'density.tif',
  resolution = 1.0,
  returns    = 'all',
  exclude_cls = '7')

Scan-Angle and Return Analysis

# Filter by scan angle
wbw_filter_lidar_scan_angles(i         = 'survey.las',
  output    = 'survey_nadir.las',
  threshold = 15.0)

# Intensity image
wbw_lidar_idw_interpolation(i          = 'survey_nadir.las',
  output     = 'intensity.tif',
  parameter  = 'intensity',
  returns    = 'all',
  resolution = 1.0,
  weight     = 1.0,
  radius     = 2.5)

Tile Management

Working with large surveys requires tiling:

# Tile large dataset
wbw_lidar_tile(i        = 'full_survey.las',
  width    = 500.0,
  height   = 500.0,
  origin_x = 0.0,
  origin_y = 0.0)

# Add a buffer overlap to each tile
wbw_lidar_tile_footprint(i      = 'tile_000_000.las',
  output = 'tile_000_000_footprint.shp',
  hull   = FALSE)

Segmentation and Vegetation Analysis

# Individual tree segmentation from normalised LiDAR
wbw_individual_tree_detection(i          = 'survey_normalised.las',
  output     = 'tree_tops.shp',
  min_search_radius = 1.0,
  min_height  = 2.0,
  max_search_radius = 5.0,
  max_height  = 40.0)

# Canopy cover (fraction first return above height threshold)
wbw_lidar_segmentation_based_filter(i            = 'survey_classified.las',
  output       = 'survey_seg_filtered.las',
  slope_threshold = 15.0,
  max_edge_length = 0.5,
  classify     = TRUE)

WbW-Pro Spotlight: LiDAR Change and Disturbance Analysis

  • Problem: Compare repeat LiDAR epochs to detect disturbance in a repeatable way.
  • Tool: lidar_change_and_disturbance_analysis
  • Typical inputs: Baseline tile set, monitoring tile set, output resolution, minimum change threshold.
  • Typical outputs: Change rasters plus summary metrics for affected area, hotspot intensity, and QA review.
result <- s$lidar_change_and_disturbance_analysis(
  baseline_tiles = '/data/lidar_epoch_2022/',
  monitor_tiles  = '/data/lidar_epoch_2025/',
  resolution     = 1.0,
  min_change_m   = 0.5,
  output_prefix  = 'disturbance_2025_vs_2022'
)

print(result)

Note: This workflow requires a session initialized with a valid Pro licence.


Complete LiDAR Processing Workflow

library(whiteboxworkflows)

s <- wbw_session()
setwd('/data/lidar_project')

las_file <- 'raw_survey.las'

# 1. Remove outliers
wbw_lidar_remove_outliers(i = las_file, output = 'clean.las', radius = 2.0, elev_diff = 25.0)

# 2. Ground classification
wbw_lidar_ground_point_filter(i = 'clean.las', output = 'classified.las', radius = 2.0,
  min_elev_diff = 0.15, max_elev_diff = 1.3, num_iter = 5,
  threshold = 0.15, slope_threshold = 0.15, height_threshold = 1.0,
  classify = TRUE, slope = FALSE, height_diff = TRUE,
  filter_return_all = FALSE)

# 3. DTM
wbw_tin_gridding(i = 'classified.las', output = 'dtm.tif', returns = 'last',
  resolution = 1.0, exclude_cls = '1,3,4,5,6,7')

# 4. DSM and CHM
wbw_lidar_idw_interpolation(i = 'classified.las', output = 'dsm.tif', parameter = 'elevation',
  returns = 'first', resolution = 1.0, weight = 1.0, radius = 2.5,
  exclude_cls = '7')
wbw_subtract(input1 = 'dsm.tif', input2 = 'dtm.tif', output = 'chm.tif')

# 5. Normalise for vegetation analysis
wbw_normalize_lidar(i = 'classified.las', ground = 'classified.las',
  output = 'normalised.las', ignore_ground_distance = FALSE)

# 6. Tree detection
wbw_individual_tree_detection(i = 'normalised.las', output = 'tree_tops.shp',
  min_search_radius = 1.0, min_height = 2.0,
  max_search_radius = 5.0, max_height = 40.0)

# 7. Point density
wbw_lidar_density(i = 'classified.las', output = 'density.tif',
  resolution = 1.0, returns = 'all', exclude_cls = '7')

cat('LiDAR processing complete.\n')

Tips

  • Choose your format wisely: LAS is universal and compact; LAZ adds compression and is ideal for archival or transmission. COPC is cloud-optimized and best for remote HTTP range-request access. Use LAZ or LAS for terrestrial/airborne surveys; COPC for cloud-native workflows.
  • Always validate classifications: Use lidar_histogram() and lidar_info() to inspect point distributions by return and classification. Misclassified ground points silently corrupt DTMs and downstream hydrology.
  • DTM vs. DSM vs. CHM: Generate all three at the same resolution so derivatives align. A common pitfall is mixing DTM and DSM resolution.
  • Ground filtering is critical: Outliers and noise (classification 7) should be excluded before gridding. Use lidar_filter_for_ground() to remove spikes and erratic points.
  • Normalization enables vegetation analysis: Always normalize point clouds (subtract DTM) before individual tree detection or canopy structural metrics.
  • Monitor memory for large surveys: Point clouds are memory-intensive. For datasets > 1 GB, use streaming APIs (lidar_read_chunked()) rather than loading the entire tile at once.
  • Coordinate reference systems matter: LAS headers carry CRS as WKT. Verify WKT matches your project CRS before gridding or merging tiles.
  • Density and grid resolution: If point density is < 0.5 pts/m², consider upsampling or smoothing the output grid to avoid isolated pits or peaks.

Remote Sensing Analysis

Remote sensing workflows in WbW-R cover multispectral and hyperspectral image analysis: spectral index computation, image enhancement, principal component analysis, image segmentation, unsupervised and supervised classification, accuracy assessment, and change detection. All computation runs in the Whitebox backend.

Sensor Bundle First: When working with Sentinel-2, Landsat, PlanetScope, or SAR product folders, open the scene as a sensor bundle using wbw_sensor_bundle_from_path(). Bundles provide automatic metadata discovery, key-based band access, and one-call true/false-colour composites without hardcoding file names.


Core Concepts

Remote sensing image analysis requires familiarity with these core ideas:

  • Spectral bands: Distinct wavelength ranges (e.g. blue 450–510 nm, red 620–750 nm, NIR 750–900 nm). Different materials reflect different bands, enabling material discrimination.
  • Spectral indices: Normalized ratios of bands that isolate phenomena. NDVI (Normalized Difference Vegetation Index) uses NIR and red to measure greenness; NDWI uses NIR and SWIR for water content.
  • Spatial resolution: Pixel size in meters (Sentinel-2: 10 m for visible/NIR, 20 m for SWIR; Landsat: 30 m; SAR: 5–30 m depending on mode).
  • Temporal resolution: Revisit interval (Sentinel-2: 5 days; Landsat: 16 days; PlanetScope: daily or higher).
  • Atmospheric effects: Raw radiance is affected by aerosols, water vapour, and ozone. Surface reflectance products are atmospherically corrected; still require relative normalization for multi-date analysis.
  • Cloud masking: Cloud and cloud shadow pixels must be identified and excluded before classification or change detection using quality/QA bands.
  • Supervised classification: Training polygons with known labels teach a model to assign classes to unlabelled pixels. Training quality directly controls classification accuracy.
  • Unsupervised classification: Clustering algorithms (K-means, ISODATA) discover spectral clusters without training labels; useful for exploratory analysis.
  • Change detection: Comparing indices or classifications across dates to identify land-use changes, disturbance, or phenological shifts.

Session Setup

library(whiteboxworkflows)

s <- wbw_session()
setwd('/data/remote_sensing')

Sensor Bundles — Cross-Family Scene Ingestion

wbw_sensor_bundle_from_path() wraps a product folder and returns a bundle object with a consistent interface across sensor families.

Opening a Bundle

# Works with Sentinel-2, Landsat 8/9, PlanetScope, SPOT, SAR, etc.
bundle <- wbw_sensor_bundle_from_path(
  '/data/S2B_MSIL2A_20240601T105619_N0510_R094_T32TPT.SAFE',
  session = s
)

# Or from a zip archive — WbW extracts it automatically
bundle <- wbw_sensor_bundle_from_path('/data/LC09_L2SP_017030_20240610.tar', session = s)

Discovering Available Layers

meta <- bundle$metadata()
cat('Family:   ', meta$family, '\n')
cat('Mission:  ', meta$mission, '\n')
cat('Tile:     ', meta$tile_id, '\n')
cat('Level:    ', meta$processing_level, '\n')
cat('Cloud %:  ', meta$cloud_cover_percent, '\n')
cat('Datetime: ', meta$acquisition_datetime_utc, '\n')

print(bundle$list_band_keys())        # e.g. c('B02', 'B03', 'B04', 'B08', ...)
print(bundle$list_measurement_keys()) # e.g. c('ndvi', 'ndwi')
print(bundle$list_qa_keys())          # e.g. c('SCL', 'QA_PIXEL')
print(bundle$list_aux_keys())         # e.g. c('AOT', 'WVP')
print(bundle$list_asset_keys())

Reading Bands by Key

# Keys follow sensor-native naming — no need to memorise file paths
blue  <- bundle$read_band('B02')   # Sentinel-2 blue
green <- bundle$read_band('B03')
red   <- bundle$read_band('B04')
nir   <- bundle$read_band('B08')
swir1 <- bundle$read_band('B11')
swir2 <- bundle$read_band('B12')

# Landsat 9 uses different key names
# blue  <- bundle$read_band('SR_B2')
# nir   <- bundle$read_band('SR_B5')

# QA / cloud mask
scl <- bundle$read_qa_layer('SCL')   # Sentinel-2 scene classification

Quick-Look Composites

# True-colour GeoTIFF (auto-enhanced by default)
bundle$write_true_colour(output_path = 'true_colour.tif')

# False-colour (NIR-Red-Green) composite
bundle$write_false_colour(output_path = 'false_colour.tif')

Cross-Family NDVI Pattern

compute_ndvi_from_bundle <- function(bundle_path, output_path, session = NULL) {
  b <- wbw_sensor_bundle_from_path(bundle_path, session = session)
  meta <- b$metadata()

  if ('ndvi' %in% b$list_measurement_keys()) {
    ndvi_r <- b$read_measurement('ndvi')
    wbw_write_raster(ndvi_r, output_path)
  } else {
    # Fall back to band-based NDVI
    if (meta$family == 'sentinel2') {
      red_r <- b$read_band('B04')
      nir_r <- b$read_band('B08')
    } else {
      red_r <- b$read_band('SR_B4')
      nir_r <- b$read_band('SR_B5')
    }
    wbw_normalized_difference_index(input1 = nir_r$file_path(), input2 = red_r$file_path(),
      output = output_path)
  }
}

compute_ndvi_from_bundle('/data/S2B_MSIL2A_20240601.SAFE', 'ndvi_s2.tif', s)
compute_ndvi_from_bundle('/data/LC09_L2SP_017030.tar',     'ndvi_l9.tif', s)

Cloud and No-Data Masking

# Sentinel-2 SCL: 3=cloud shadow, 8=med cloud, 9=high cloud, 10=thin cirrus
scl <- bundle$read_qa_layer('SCL')
wbw_raster_calculator(output    = 'red_cloud_free.tif',
  statement = paste0("if('", scl$file_path(), "' == 3 or '", scl$file_path(), "' == 8 ",
                     "or '", scl$file_path(), "' == 9 or '", scl$file_path(), "' == 10, ",
                     "nodata, '", red$file_path(), "')"))

# Landsat Collection 2 QA_PIXEL — use bitwise expressions via raster_calculator
# qa <- bundle$read_qa_layer('QA_PIXEL')

Reading Multi-Band Imagery (Individual Files)

When bundles are not available (e.g., reprojected mosaics, custom composites, legacy archives), load individual band files in the conventional way:

b2 <- wbw_read_raster('LC08_B2_blue.tif')
b3 <- wbw_read_raster('LC08_B3_green.tif')
b4 <- wbw_read_raster('LC08_B4_red.tif')
b5 <- wbw_read_raster('LC08_B5_nir.tif')
b6 <- wbw_read_raster('LC08_B6_swir1.tif')
b7 <- wbw_read_raster('LC08_B7_swir2.tif')

# Resample if bands have different resolutions
wbw_resample(inputs   = b6$file_path(),
  output   = 'b6_10m.tif',
  cell_size = 0.0,
  base     = b4$file_path(),
  method   = 'bilinear')

Spectral Indices

Normalised Difference Vegetation Index (NDVI)

$$NDVI = \frac{NIR - Red}{NIR + Red}$$

# Using bands read from a bundle
wbw_normalized_difference_index(input1 = nir$file_path(),    # NIR
  input2 = red$file_path(),    # Red
  output = 'ndvi.tif')

Common Water, Snow, and Urban Indices

# Define index triplets: (output name, numerator band, denominator band)
indices <- list(
  list(name = 'ndwi',  b1 = green, b2 = nir),   # open water;  Green/NIR
  list(name = 'mndwi', b1 = green, b2 = swir1),  # urban water; Green/SWIR1
  list(name = 'nbr',   b1 = nir,   b2 = swir2),  # fire severity; NIR/SWIR2
  list(name = 'ndsi',  b1 = green, b2 = swir1),  # snow/ice;    Green/SWIR1
  list(name = 'ndbi',  b1 = swir1, b2 = nir)     # built-up;    SWIR1/NIR
)

for (idx in indices) {
  wbw_normalized_difference_index(input1 = idx$b1$file_path(),
    input2 = idx$b2$file_path(),
    output = paste0(idx$name, '.tif'))
}

EVI (Enhanced Vegetation Index)

$$EVI = 2.5 \cdot \frac{NIR - Red}{NIR + 6 \cdot Red - 7.5 \cdot Blue + 1}$$

wbw_raster_calculator(output    = 'evi.tif',
  statement = paste0("2.5 * ('", nir$file_path(), "' - '", red$file_path(), "') / (",
                     "'", nir$file_path(), "' + 6.0 * '", red$file_path(), "' - 7.5 * '",
                     blue$file_path(), "' + 1.0)"))

Image Enhancement

Percentage Linear Stretch

wbw_percentage_contrast_stretch(i      = b4$file_path(),
  output = 'b4_stretched.tif',
  clip   = 1.0,
  tail   = 'both',
  num_tones = 256)

Standard Deviation Stretch

wbw_standard_deviation_contrast_stretch(i         = b4$file_path(),
  output    = 'b4_sd_stretch.tif',
  stdev     = 2.0,
  num_tones = 256)

Gamma Correction

wbw_gamma_correction(i      = b4$file_path(),
  output = 'b4_gamma.tif',
  gamma  = 0.55)

Histogram Matching

wbw_histogram_matching(i          = 'image_target.tif',
  histo_file = 'reference_histogram.html',
  output     = 'image_matched.tif')

IHS Colour Space and Pan-Sharpening

wbw_rgb_to_ihs(intensity = b4$file_path(),
  hue       = b3$file_path(),
  saturation = b2$file_path(),
  output    = 'ihs.tif')

# Pan-sharpen
wbw_ihs_to_rgb(intensity  = 'pan_band.tif',
  hue        = 'ihs_hue.tif',
  saturation = 'ihs_sat.tif',
  output     = 'pansharpened.tif')

Spatial Filtering

# Gaussian smoothing
wbw_gaussian_filter(i = b5$file_path(), output = 'b5_gauss.tif', sigma = 2.0)

# Edge detection — Canny
wbw_canny_edge_detection(i = b4$file_path(), output = 'edges.tif', sigma = 0.5,
  low_threshold = 0.05, high_threshold = 0.15, add_back = FALSE)

# General-purpose GLCM texture (multiband output)
wbw_glcm_texture(input                  = b5$file_path(),
  window_size            = 9,
  distance               = 1,
  angles                 = '0,45,90,135',
  features               = 'contrast,homogeneity,entropy',
  direction_aggregation  = 'mean',
  levels                 = 32,
  output                 = 'glcm_texture.tif')

Principal Component Analysis

# Supply a comma-separated list of band files
all_bands <- paste(c(b2$file_path(), b3$file_path(), b4$file_path(),
                     b5$file_path(), b6$file_path(), b7$file_path()),
                   collapse = ';')

wbw_principal_component_analysis(inputs        = all_bands,
  output        = 'pca_loadings.html',
  num_comp      = 6,
  standardised  = FALSE)

# PC scores are written as pc1.tif, pc2.tif, ... in the working directory

Image Segmentation

wbw_image_segmentation(inputs       = all_bands,
  output       = 'segments.tif',
  threshold    = 30.0,
  steps        = 10,
  min_area     = 10)

Object-Based Image Analysis (OBIA) Baseline

Phase 1 open-core OBIA tools are available through standard wbw_<tool>(...) wrappers and can be discovered as a grouped set:

obia_tools <- wbw_tools_in_remote_sensing_obia(session = s)
print(obia_tools$id)
# 1) Segment imagery into compact object candidates
wbw_segment_slic_superpixels(inputs      = all_bands,
  region_size = 18,
  compactness = 12.0,
  output      = 'segments_slic.tif')

# 2) Merge undersized regions
wbw_segments_merge_small_regions(segments = 'segments_slic.tif',
  min_size = 12,
  method   = 'longest',
  output   = 'segments_clean.tif')

# 3) Extract spectral/shape/texture object features
wbw_object_features_spectral_basic(segments = 'segments_clean.tif',
  inputs   = all_bands,
  output   = 'object_features_spectral.csv')

wbw_object_features_shape_basic(segments = 'segments_clean.tif',
  output   = 'object_features_shape.csv')

wbw_object_features_texture_glcm_basic(segments = 'segments_clean.tif',
  input    = b5$file_path(),
  levels   = 16,
  output   = 'object_features_texture.csv')

# 4) Train/apply object-level RF model
wbw_classify_objects_random_forest(features = 'object_features_all.csv',
  training = 'training_segments.csv',
  output   = 'object_predictions.csv')

# 5) Evaluate object-level accuracy
wbw_evaluate_object_classification_accuracy(predictions = 'object_predictions.csv',
  reference   = 'validation_segments.csv',
  output      = 'object_accuracy.json')

# Optional one-call baseline pipeline
wbw_obia_pipeline_basic(inputs        = all_bands,
  training      = 'training_segments.csv',
  output_prefix = 'obia_field01',
  segment_method = 'slic')

Advanced OBIA Capabilities (Open Tier)

The OBIA surface now includes advanced open-tier tools. You can list them from the grouped discovery helper and run each directly with wbw_<tool>(...) wrappers (or with wbw_run_tool(...) for dynamic tool-id dispatch).

Segmentation and scale control:

  • segment_watershed_markers
  • segment_multiresolution_hierarchical
  • segment_scale_parameter_optimizer
  • segments_split_low_cohesion

Object conversion and interoperability:

  • segments_to_polygons
  • polygons_to_segments

Advanced features:

  • object_features_context_neighbors
  • object_features_topology_relations

Advanced classification:

  • classify_objects_svm
  • classify_objects_ensemble_pro
  • classify_objects_rules_basic
  • classify_objects_rules_hierarchical
  • object_class_probability_maps
  • object_uncertainty_diagnostics_pro

Hierarchy and propagation:

  • build_object_hierarchy_multiscale
  • propagate_labels_across_hierarchy

Post-processing and quality:

  • objects_enforce_min_mapping_unit
  • objects_boundary_refinement_pro
  • evaluate_segmentation_quality_pro

Workflow operations:

  • obia_batch_orchestrator_pro
  • obia_audit_report_pro
# 1) Build multiscale hierarchy products
wbw_segment_multiresolution_hierarchical(inputs        = all_bands,
  coarse_k      = 900.0,
  fine_k        = 280.0,
  output_prefix = 'site01_hier')

# 2) Context and topology features for difficult class boundaries
wbw_object_features_context_neighbors(segments = 'site01_hier_segments_fine.tif',
  output   = 'site01_context.csv')

wbw_object_features_topology_relations(segments = 'site01_hier_segments_fine.tif',
  output   = 'site01_topology.csv')

# 3) Ensemble and rule-hierarchical object classification
wbw_classify_objects_ensemble_pro(features = 'site01_features_all.csv',
  training = 'site01_training_segments.csv',
  output   = 'site01_pred_ensemble.csv')

wbw_classify_objects_rules_hierarchical(features = 'site01_features_all.csv',
  rules    = 'site01_rules.csv',
  output   = 'site01_pred_rules.csv')

# 4) Probability maps and uncertainty diagnostics
wbw_object_class_probability_maps(predictions = 'site01_pred_ensemble.csv',
  output      = 'site01_probabilities.csv')

wbw_object_uncertainty_diagnostics_pro(probabilities      = 'site01_probabilities.csv',
  low_conf_threshold = 0.7,
  output             = 'site01_uncertainty.json')
# Batch orchestration for multi-scene runs
wbw_obia_batch_orchestrator_pro(jobs = list(
    list(
      inputs = list('s1_red.tif', 's1_green.tif', 's1_nir.tif'),
      training = 's1_training.csv',
      output_prefix = 'prod/s1',
      segment_method = 'graph'
    ),
    list(
      inputs = list('s2_red.tif', 's2_green.tif', 's2_nir.tif'),
      training = 's2_training.csv',
      output_prefix = 'prod/s2',
      segment_method = 'slic'
    )
  ),
  output = 'prod/obia_batch_report.json')

wbw_obia_audit_report_pro(artifacts = list(
    'prod/s1_object_predictions.csv',
    'prod/s2_object_predictions.csv',
    'prod/obia_batch_report.json'
  ),
  output = 'prod/obia_audit.json')

Use segments_to_polygons and polygons_to_segments when analysts need to edit object boundaries in vector format and then return those objects back into raster-segment workflows.


Unsupervised Classification

KMeans

wbw_modified_k_means_clustering(inputs        = all_bands,
  output        = 'kmeans.tif',
  out_html      = 'kmeans_report.html',
  start_clusters = 5,
  end_clusters   = 10,
  max_iterations = 25,
  class_change   = 2.0)

DBSCAN

wbw_dbscan(inputs   = all_bands,
  output   = 'dbscan.tif',
  search_dist = 2.5,
  min_points   = 10)

Supervised Classification

Random Forest

# 1. Fit the model
wbw_random_forest_classification_fit(inputs  = all_bands,
  training = 'training_polygons.shp',
  field    = 'CLASS_ID',
  output   = 'rf_model.bin',
  n_trees  = 100,
  min_leaf_size = 1,
  max_features  = 0.0)

# 2. Apply the model
wbw_random_forest_classification_predict(inputs  = all_bands,
  model   = 'rf_model.bin',
  output  = 'rf_classification.tif')

Support Vector Machine

wbw_svm_classification(inputs  = all_bands,
  training = 'training_polygons.shp',
  field    = 'CLASS_ID',
  output   = 'svm_classification.tif',
  c        = 200.0,
  gamma    = 50.0,
  cost     = 10.0,
  tolerance = 0.1,
  test_proportion = 0.2)

K-Nearest Neighbours

wbw_knn_classification(inputs  = all_bands,
  training = 'training_polygons.shp',
  field    = 'CLASS_ID',
  output   = 'knn_classification.tif',
  k        = 5,
  scaling  = TRUE,
  test_proportion = 0.2)

Accuracy Assessment

wbw_kappa_index(i1     = 'rf_classification.tif',
  i2     = 'reference_classification.tif',
  output = 'accuracy_report.html')

Change Detection

Image Differencing

wbw_change_vector_analysis(date1     = paste(c('t1_b2.tif','t1_b3.tif','t1_b4.tif','t1_b5.tif'), collapse=';'),
  date2     = paste(c('t2_b2.tif','t2_b3.tif','t2_b4.tif','t2_b5.tif'), collapse=';'),
  magnitude = 'cva_magnitude.tif',
  direction = 'cva_direction.tif')

Additional Change, Thermal, and Spectral Sprint Tools

The newest remote sensing tools are available through session tool execution, which is useful while dedicated R helper wrappers are still catching up:

# Image difference change detection
img_diff <- s$run_tool('image_difference_change_detection', list(
  t1_inputs = c('t1_b2.tif', 't1_b3.tif', 't1_b4.tif'),
  t2_inputs = c('t2_b2.tif', 't2_b3.tif', 't2_b4.tif'),
  mode = 'magnitude',
  threshold_sigma = 2.0,
  output = 'img_diff_mag.tif',
  output_signed = 'img_diff_signed.tif',
  output_mask = 'img_diff_mask.tif'
))

# Post-classification change with class remapping
post_class <- s$run_tool('post_classification_change', list(
  t1_classified = 'lulc_t1.tif',
  t2_classified = 'lulc_t2.tif',
  transition_scale = 1000,
  t1_class_remap = list('11' = 1, '12' = 1),
  t2_class_remap = list('41' = 4, '42' = 4),
  output = 'post_class_transition.tif'
))

# PCA-based change detection
pca_change <- s$run_tool('pca_based_change_detection', list(
  t1_inputs = c('t1_b2.tif', 't1_b3.tif', 't1_b4.tif'),
  t2_inputs = c('t2_b2.tif', 't2_b3.tif', 't2_b4.tif'),
  component = 1,
  threshold_sigma = 2.0,
  output = 'pca_change_pc1.tif',
  output_mask = 'pca_change_mask.tif',
  output_report = 'pca_change_report.json'
))

# Dark object subtraction and DN->TOA reflectance
dos <- s$run_tool('dark_object_subtraction', list(
  inputs = c('B02.tif', 'B03.tif', 'B04.tif', 'B08.tif'),
  percentile = 1.0,
  output = 'dos_stack.tif',
  output_diagnostic_offsets = 'dos_offsets.tif'
))

toa <- s$run_tool('dn_to_toa_reflectance', list(
  inputs = c('B02.tif', 'B03.tif', 'B04.tif', 'B08.tif'),
  sensor_bundle_root = '/data/LC09_L1TP_017030_20240420_20240426_02_T1',
  output = 'toa_stack.tif'
))

# Thermal emissivity and LST
emiss <- s$run_tool('ndvi_based_emissivity', list(
  red_input = 'B04.tif',
  nir_input = 'B08.tif',
  output = 'emissivity.tif'
))

lst_sc <- s$run_tool('land_surface_temperature_single_channel', list(
  thermal_input = 'B10.tif',
  sensor_bundle_root = '/data/LC09_L1TP_017030_20240420_20240426_02_T1',
  output = 'lst_single_channel.tif'
))

lst_sw <- s$run_tool('land_surface_temperature_split_window', list(
  thermal1_input = 'B10.tif',
  thermal2_input = 'B11.tif',
  emissivity_mean_constant = 0.98,
  emissivity_delta_constant = 0.0,
  output = 'lst_split_window.tif'
))

# Spectral analytics
sam <- s$run_tool('spectral_angle_mapper', list(
  inputs = c('B02.tif', 'B03.tif', 'B04.tif', 'B08.tif'),
  endmembers = list(
    list(name = 'water', values = c(0.03, 0.02, 0.01, 0.00)),
    list(name = 'veg', values = c(0.05, 0.10, 0.06, 0.40))
  ),
  output = 'sam_classes.tif',
  output_angle = 'sam_angle.tif'
))

cont <- s$run_tool('continuum_removal', list(
  inputs = c('hyp_b1.tif', 'hyp_b2.tif', 'hyp_b3.tif'),
  wavelengths = c(450.0, 550.0, 650.0),
  output = 'continuum_removed.tif'
))

unmix <- s$run_tool('linear_spectral_unmixing', list(
  inputs = c('B02.tif', 'B03.tif', 'B04.tif', 'B08.tif'),
  endmembers = list(
    list(name = 'soil', values = c(0.18, 0.20, 0.22, 0.24)),
    list(name = 'veg', values = c(0.05, 0.09, 0.06, 0.40))
  ),
  output = 'unmix_frac.tif',
  output_residual = 'unmix_residual.tif'
))

mnf <- s$run_tool('minimum_noise_fraction', list(
  inputs = c('B02.tif', 'B03.tif', 'B04.tif', 'B08.tif'),
  num_components = 3,
  output = 'mnf_components.tif',
  output_inverse = 'mnf_inverse.tif'
))

lib_match <- s$run_tool('spectral_library_matching', list(
  inputs = c('B02.tif', 'B03.tif', 'B04.tif', 'B08.tif'),
  metric = 'sam',
  library = list(
    list(name = 'water', values = c(0.03, 0.02, 0.01, 0.00)),
    list(name = 'soil', values = c(0.18, 0.20, 0.22, 0.24))
  ),
  output = 'lib_class.tif',
  output_score = 'lib_score.tif'
))

# PolSAR decomposition tools
cp <- s$run_tool('cloude_pottier_decomposition', list(
  inputs = c('t11.tif', 't22.tif', 't33.tif'),
  matrix_format = 'diag3',
  output = 'cloude_pottier_haa.tif'
))

fd <- s$run_tool('freeman_durden_decomposition', list(
  inputs = c('c11.tif', 'c22.tif', 'c33.tif'),
  matrix_format = 'diag3',
  output = 'freeman_durden.tif',
  output_clip_mask = 'freeman_clip.tif'
))

WbW-Pro Spotlight: In-Season Crop Stress Intervention Planning

  • Problem: Turn in-season crop stress signals into actionable intervention priorities.
  • Tool: in_season_crop_stress_intervention_planning
  • Typical inputs: NDVI raster, canopy-temperature raster, soil-moisture raster.
  • Typical outputs: Intervention-priority and intervention-class products with summary reporting.
result <- s$in_season_crop_stress_intervention_planning(
  ndvi               = 'ndvi_latest.tif',
  canopy_temperature = 'lst_latest.tif',
  soil_moisture      = 'soil_moisture_latest.tif',
  output_prefix      = 'field_07_stress'
)

print(result)

Note: This workflow requires a session initialized with a valid Pro licence.


Complete Remote Sensing Workflow

The following end-to-end example uses a sensor bundle for ingestion and random forest for land-cover mapping:

library(whiteboxworkflows)
s <- wbw_session()

# 1. Open scene as a bundle — works with S2, Landsat 9, PlanetScope, etc.
bundle <- wbw_sensor_bundle_from_path(
  '/data/S2B_MSIL2A_20240601T105619_N0510_R094_T32TPT.SAFE',
  session = s
)
meta <- bundle$metadata()
cat(sprintf('Family: %s, tile: %s, clouds: %.1f%%\n',
            meta$family, meta$tile_id, meta$cloud_cover_percent))

if (meta$cloud_cover_percent > 20) {
  stop('Too cloudy for classification')
}

# 2. Read bands by key (family-agnostic)
blue  <- bundle$read_band('B02')
green <- bundle$read_band('B03')
red   <- bundle$read_band('B04')
nir   <- bundle$read_band('B08')

wbw_resample(inputs = bundle$read_band('B11')$file_path(), output = 'swir1_10m.tif',
  cell_size = 0.0, base = red$file_path(), method = 'bilinear')
wbw_resample(inputs = bundle$read_band('B12')$file_path(), output = 'swir2_10m.tif',
  cell_size = 0.0, base = red$file_path(), method = 'bilinear')
swir1 <- wbw_read_raster('swir1_10m.tif')
swir2 <- wbw_read_raster('swir2_10m.tif')

bands <- list(blue, green, red, nir, swir1, swir2)
band_paths <- paste(sapply(bands, function(b) b$file_path()), collapse = ';')

# 3. Mask clouds using SCL
scl <- bundle$read_qa_layer('SCL')

# 4. Spectral indices
wbw_normalized_difference_index(input1 = nir$file_path(), input2 = red$file_path(),
  output = 'ndvi.tif')
wbw_normalized_difference_index(input1 = green$file_path(), input2 = swir1$file_path(),
  output = 'mndwi.tif')
wbw_normalized_difference_index(input1 = nir$file_path(), input2 = swir2$file_path(),
  output = 'nbr.tif')

ndvi  <- wbw_read_raster('ndvi.tif')
mndwi <- wbw_read_raster('mndwi.tif')
nbr   <- wbw_read_raster('nbr.tif')

# 5. PCA decorrelation
wbw_principal_component_analysis(inputs       = band_paths,
  output       = 'pca_loadings.html',
  num_comp     = 4,
  standardised = TRUE)
pc_paths <- paste(paste0('pc', 1:4, '.tif'), collapse = ';')

# 6. Feature stack = bands + indices + PCA scores
feature_paths <- paste(c(band_paths, 'ndvi.tif', 'mndwi.tif', 'nbr.tif', pc_paths),
                       collapse = ';')

# 7. Train random forest
wbw_random_forest_classification_fit(inputs   = feature_paths,
  training = 'training_polygons.shp',
  field    = 'CLASS',
  output   = 'rf_model.bin',
  n_trees  = 500,
  test_proportion = 0.25)

# 8. Predict
wbw_random_forest_classification_predict(inputs = feature_paths,
  model  = 'rf_model.bin',
  output = 'lulc.tif')

# 9. Generalise
wbw_generalize_classified_raster(i = 'lulc.tif', output = 'lulc_clean.tif', min_size = 9)

# 10. Accuracy assessment
wbw_kappa_index(i1 = 'lulc_clean.tif', i2 = 'validation_reference.tif',
  output = 'accuracy.html')

# 11. Quick-look composite for QC
bundle$write_true_colour(output_path = 'quicklook_truecolour.tif')

cat('Remote sensing workflow complete.\n')

Tips

  • Use bundles for scene-level ingestion: wbw_sensor_bundle_from_path() eliminates hardcoded band filenames and works identically across Sentinel-2, Landsat, PlanetScope, and other supported families.
  • Screen cloud cover before processing: Check bundle metadata cloud_cover_percent before loading all bands to skip unusable scenes early in a batch workflow.
  • Atmospheric correction is iterative: Raw digital numbers and even surface reflectance products may still carry illumination and atmospheric artefacts. Histogram matching across scenes is a simple relative normalisation that works well when no reference spectrum is available.
  • Align band resolution before stacking: Sentinel-2 delivers 10 m (Blue, Green, Red, NIR) and 20 m (Red Edge, SWIR) bands. Upscale the 20 m bands to 10 m using bilinear resampling before combining them in a feature stack.
  • Always mask clouds and cloud shadows: Use quality bands (Sentinel-2 SCL or Landsat QA_PIXEL) and apply bitwise tests via raster calculator to isolate valid pixels.
  • Balance training data: Training datasets often over-represent dominant classes (e.g., forest) relative to rare classes (e.g., wetland). Sample training polygons proportional to expected class area, or oversample rare classes to improve model accuracy.
  • Check spectral separability: If accuracy assessment reports Jeffries-Matusita distance < 1.5 between any two classes, consider merging classes or collecting more spectrally diverse training polygons.
  • Use time-series techniques for change detection: Multi-date classifications can show spurious change due to atmospheric variation. Spectral indices are more robust; consider computing NDVI or NDWI time series and detecting change directly in index space.
  • Memory is your bottleneck: Large multispectral stacks fill RAM quickly. Use band-by-band processing or out-of-memory methods for scenes > 1 GB.

Raster Analysis

Raster analysis in WbW-R covers the end-to-end workflow of reading, transforming, and writing gridded data — from simple cell-value arithmetic through multi-layer overlays, proximity operations, interpolation, and statistical testing. All heavy computation runs in the Whitebox backend.


Core Concepts

Raster analysis requires understanding these fundamental terms:

  • Cell (pixel): The smallest unit of a raster. Each cell stores a single value (integer or floating-point) and has a uniform spatial extent (e.g. 10 m × 10 m).
  • Data type: Integer (Int8, Int16, Int32) for categorical data; Float32 or Float64 for continuous measurements. Data type affects precision, file size, and computation speed.
  • NoData value: Sentinel value representing missing or invalid data (e.g. -9999 or NaN). Critical for masking water, clouds, or invalid measurements in focal operations.
  • Spatial reference (CRS): Coordinate system and projection. Mismatched CRS between rasters causes silent misalignment; always verify before overlay operations.
  • Extent: The bounding box (xmin, ymin, xmax, ymax) of the raster in real-world coordinates.
  • Cell size (resolution): Cell width and height in map units. Coarser resolution is faster but loses detail; finer resolution requires more memory and computation.
  • Focal operation: Uses neighbourhood values (e.g. 3×3 kernel) to compute output. Examples: moving average, Sobel edge detection, local extrema.
  • Zonal operation: Aggregates grid values by zone (polygon or categorical layer). Examples: mean by land-cover class, sum by administrative boundary.
  • Reclassification: Reassigning cell values according to lookup rules. Common for categorizing continuous data (e.g. slope classes) or remapping land-cover codes.
  • Resampling: Changing cell size or alignment. Methods: nearest-neighbour (preserves categories), bilinear (smooth), cubic (smoother).

Session Setup

library(whiteboxworkflows)

s <- wbw_session()
setwd('/data/raster')

Reading and Inspecting Rasters

r <- wbw_read_raster('input.tif')
meta <- r$metadata()

cat('Rows:', meta$rows, '  Cols:', meta$columns, '\n')
cat('Cell size:', meta$resolution_x, 'x', meta$resolution_y, '\n')
cat('EPSG:', meta$epsg, '\n')
cat('NoData:', meta$nodata, '\n')
cat('Data type:', meta$data_type, '\n')

Raster Calculator

raster_calculator evaluates an algebraic expression that combines one or more raster inputs:

# Single-raster expression — multiply by constant
wbw_raster_calculator(statement = "'elevation.tif' * 3.28084",
  output    = 'elevation_ft.tif')

# Multi-raster NDVI expression
wbw_raster_calculator(statement = "('nir.tif' - 'red.tif') / ('nir.tif' + 'red.tif')",
  output    = 'ndvi.tif')

# Conditional expression using isnull() and nodata()
wbw_raster_calculator(statement = "if(isnull('input.tif'), nodata(), 'input.tif' + 100.0)",
  output    = 'result.tif')

Special tokens available in statements: nodata(), isnull(), if(), abs(), sqrt(), log(), log2(), exp(), min(), max(), pi, integer constants, and floating-point constants.


Reclassification

# Reclassify using from-to-becomes triplets
# Format: "from;to;new;from;to;new;..."
wbw_reclass(i         = 'slope.tif',
  output    = 'slope_class.tif',
  reclass_vals = '0;5;1;5;15;2;15;30;3;30;45;4;45;90;5',
  assign_mode = FALSE)

# Equal-interval reclassification
wbw_reclass_equal_interval(i         = 'ndvi.tif',
  output    = 'ndvi_class.tif',
  interval  = 0.1,
  start_val = -1.0,
  end_val   = 1.0)

# Reclassify from a CSV lookup table
wbw_reclass_from_file(i          = 'landcover.tif',
  reclass_file = 'reclass_table.txt',
  output     = 'landcover_reclassed.tif')

Focal Statistics (Moving Windows)

# Gaussian filter
wbw_gaussian_filter(i = 'dem.tif', output = 'dem_gauss.tif', sigma = 1.5)

# Median filter (feature-preserving)
wbw_median_filter(i = 'dem.tif', output = 'dem_median.tif', filterx = 5, filtery = 5,
  sig_digits = 2)

# Feature-preserving smoothing (Zhang et al.)
wbw_feature_preserving_smoothing(dem = 'dem.tif', output = 'dem_fps.tif', filter = 11,
  norm_diff = 8.0, num_iter = 3, max_diff = 0.5, zfactor = 1.0)

# Standard deviation in a window
wbw_standard_dev_filter(i = 'dem.tif', output = 'dem_sd.tif', filterx = 11, filtery = 11)

# Percentile filter
wbw_percentile_filter(i = 'dem.tif', output = 'dem_pct75.tif', filterx = 11, filtery = 11,
  sig_digits = 2)

Morphological Operations

# Morphological closing (fills gaps in foreground)
wbw_closing(i = 'binary.tif', output = 'binary_close.tif', filterx = 3, filtery = 3)

# Morphological opening (removes small foreground blobs)
wbw_opening(i = 'binary.tif', output = 'binary_open.tif', filterx = 3, filtery = 3)

# Top-hat transform (white)
wbw_tophat_transform(i = 'raster.tif', output = 'tophat.tif', filterx = 11, filtery = 11,
  variant = 'white')

Global and Zonal Statistics

# Global statistics (summary of entire raster)
wbw_raster_histogram(i = 'dem.tif', output = 'dem_histogram.html')

# Zonal statistics — mean elevation per watershed zone
wbw_zonal_statistics(i         = 'dem.tif',
  features  = 'watersheds.tif',
  output    = 'watershed_stats.html',
  stat      = 'mean',
  out_raster = 'watershed_mean_elev.tif')

Raster Overlay Operations

# Weighted sum (multi-criteria suitability)
wbw_weighted_sum(inputs  = 'soil.tif;slope.tif;distance.tif',
  weights = '0.3;0.5;0.2',
  output  = 'suitability.tif')

# Weighted overlay (MCE) with constraint
wbw_weighted_overlay(inputs      = 'factor1.tif;factor2.tif;factor3.tif',
  weights     = '0.4;0.4;0.2',
  output      = 'suitability_mce.tif',
  scale_max   = 5.0,
  scale_min   = 0.0,
  scale_factor  = 1.0)

Resampling and Aggregation

wbw_resample(inputs      = 'dem.tif',
  output      = 'dem_10m.tif',
  cell_size   = 10.0,
  method      = 'bilinear')

wbw_aggregate_raster(i = 'dem.tif', output = 'dem_agg.tif', agg_factor = 5,
  type = 'mean')

Proximity Analysis

# Euclidean distance
wbw_euclidean_distance(i = 'sources.tif', output = 'euclidean_dist.tif')

# Cost-distance accumulation
wbw_cost_distance(source = 'sources.tif',
  cost   = 'friction.tif',
  out_accum = 'cost_accum.tif',
  out_backlink = 'cost_backlink.tif')

# Least-cost path
wbw_cost_pathway(destination = 'destinations.tif',
  backlink    = 'cost_backlink.tif',
  output      = 'least_cost_path.tif',
  zero_background = FALSE)

# Raster buffer
wbw_buffer_raster(i = 'features.tif', output = 'buffered.tif',
  size = 250.0, gridcells = FALSE)

Raster Object Analysis

# Label connected patches (foreground = non-zero)
wbw_clump(i = 'binary.tif', output = 'patches.tif',
  diag = TRUE, zero_back = TRUE)

# Remove small patches below area threshold (10 000 m²)
wbw_remove_spurs(i = 'patches.tif', output = 'patches_clean.tif',
  iterations = 10)

# Raster area of each patch value
wbw_raster_area(i = 'patches.tif', output = 'patch_area.tif',
  out_text = FALSE, units = 'map units', zero_back = TRUE)

Interpolation from Points

pts <- wbw_read_vector('sample_points.shp')

# IDW
wbw_idw_interpolation(i         = pts$file_path(),
  field     = 'ELEV',
  output    = 'idw_surf.tif',
  use_z     = FALSE,
  weight    = 2.0,
  radius    = 2.5,
  min_points = 2,
  cell_size  = 5.0)

# Natural Neighbour
wbw_natural_neighbour_interpolation(i        = pts$file_path(),
  field    = 'ELEV',
  output   = 'nn_surf.tif',
  use_z    = FALSE,
  cell_size = 5.0)

# Radial Basis Function
wbw_radial_basis_function_interpolation(i         = pts$file_path(),
  field     = 'ELEV',
  output    = 'rbf_surf.tif',
  num_points = 8,
  cell_size  = 5.0,
  func_type  = 'ThinPlateSpline',
  poly_order = 1,
  weight     = 0.1)

Statistical Tests

# Kolmogorov-Smirnov normality test
wbw_ks_test_for_normality(i = 'dem.tif', output = 'ks_normality.html')

# Two-raster paired samples t-test
wbw_paired_sample_t_test(input1 = 'dem_2010.tif', input2 = 'dem_2020.tif',
  output = 'ttest.html', num_samples = 1000)

Contour Generation

wbw_contours_from_raster(i          = 'dem.tif',
  output     = 'contours.shp',
  interval   = 10.0,
  base       = 0.0,
  smooth     = 5,
  tolerance  = 10.0)

WbW-Pro Spotlight: Field Trafficability and Operation Planning

  • Problem: Plan equipment timing and field access under variable moisture and weather conditions.
  • Tool: field_trafficability_and_operation_planning
  • Typical inputs: DEM, normalized soil-moisture raster, optional rainfall-risk raster.
  • Typical outputs: Trafficability score raster, operation-class raster, and summary outputs.
result <- s$field_trafficability_and_operation_planning(
  dem               = 'field_dem.tif',
  soil_moisture     = 'soil_moisture_norm.tif',
  rainfall_forecast = 'rainfall_risk_norm.tif',
  output_prefix     = 'field_12_trafficability'
)

print(result)

Note: This workflow requires a session initialized with a valid Pro licence.


Complete Raster Analysis Workflow

library(whiteboxworkflows)

s <- wbw_session()
setwd('/data/raster_workflow')

dem <- wbw_read_raster('dem.tif')

# 1. Smooth DEM
wbw_feature_preserving_smoothing(dem = dem$file_path(), output = 'dem_smooth.tif', filter = 11,
  norm_diff = 8.0, num_iter = 3, max_diff = 0.5)

# 2. Slope
wbw_slope(dem = 'dem_smooth.tif', output = 'slope.tif', units = 'degrees')

# 3. Reclassify slope into erosion risk classes
wbw_reclass(i = 'slope.tif', output = 'slope_risk.tif',
  reclass_vals = '0;5;1;5;15;2;15;30;3;30;90;4')

# 4. Euclidean distance to water
wbw_euclidean_distance(i = 'water_bodies.tif', output = 'dist_water.tif')

# 5. Multi-criteria suitability overlay
wbw_weighted_sum(inputs  = paste('slope_risk.tif', 'dist_water.tif', 'soil_type.tif', sep=';'),
  weights = '0.5;0.3;0.2',
  output  = 'suitability.tif')

# 6. Reclassify to binary mask (suitability > threshold)
wbw_raster_calculator(statement = "if('suitability.tif' >= 3.0, 1.0, 0.0)",
  output    = 'suitable_areas.tif')

# 7. Generate contours
wbw_contours_from_raster(i = dem$file_path(), output = 'contours_10m.shp',
  interval = 10.0, base = 0.0, smooth = 3)

cat('Raster analysis complete.\n')

Tips

  • Choose your data type: Use integers for categorical data (land cover, classifications) to minimize file size and computation time. Use floating-point (Float32 or Float64) only for continuous measurements (elevation, temperature, probability).
  • Set NoData explicitly: Ensure your source rasters carry a valid NoData value. Missing NoData declarations can corrupt statistics and focal operations by including invalid pixels as zeros or false elevations.
  • Compress carefully: LZW and DEFLATE compression work well for most data; avoid if you need rapid random access to interior tiles. Use COMPRESS=JPEG for photographic data only (lossy, unsuitable for analysis).
  • Focal operations require buffering: Cells at raster edges cannot compute full neighbourhoods. Use expand() or accept edge effects; never assume borders are valid in derivative rasters.
  • Zonal statistics are only as good as your zones: Ensure zone boundaries are topologically clean (no overlaps, no gaps). Overlapping zones cause double-counting; gaps cause NoData regions in output.
  • Reclassification is fast but risky: Always validate output distributions (histogram) after reclassification. Off-by-one errors in class boundaries can silently produce wrong land-cover or suitability classes.
  • Memory is the constraint for large rasters: Tiles > 2 GB require out-of-core or streaming processing. Use read_by_block() for large files; avoid loading entire rasters into memory if they exceed available RAM.
  • Upsampling introduces artifacts: Never upsample (finer resolution) without a valid interpolation method. Nearest-neighbour upsampling creates blocky artefacts; bilinear is smoother but may violate data range (e.g. probability values > 1.0).

Vector Analysis

Vector GIS analysis in WbW-R covers attribute management, geometric measurement, shape analysis, spatial overlay, proximity tools, spatial joins, and vector-to-raster conversion. All computation runs in the Whitebox backend through wbw_<tool>(...) wrappers (or wbw_run_tool(...) when tool IDs are dynamic); R handles session management, sequencing, and result processing.


Core Concepts

Vector analysis depends on understanding these core concepts:

  • Feature geometry: Points (single coordinate pairs), lines (ordered sequences of coordinate pairs), and polygons (rings of coordinates forming closed boundaries). Each feature type supports different analyses.
  • Topology: The spatial relationships between features (adjacency, containment, intersection). Topological errors (overshoots, undershoots, self-intersections) corrupt overlay operations and topology queries.
  • Attribute table: The database associated with each feature layer, carrying descriptive fields and values. Attribute queries filter features; joins link external tables.
  • Spatial index: Internal index structure (R-tree or quadtree) enabling fast spatial queries. Used by intersection, containment, and proximity operations. Always build spatial indices on frequently queried layers.
  • Envelope (bounding box): The minimum rectangular boundary of a feature or layer; used for quick spatial culling before geometry tests.
  • Buffer: A polygon created at a fixed distance around a feature. Buffers model proximity zones and are fundamental to distance-based analysis.
  • Overlay (intersection, union, difference): Combining two polygon layers to create a new layer. Union merges boundaries; intersection keeps overlapping area only; difference removes one layer from another.
  • Dissolve (aggregation): Merging adjacent features with identical attribute values, creating larger aggregate features. Reduces feature count and simplifies geometry.
  • Spatial join: Associating features from one layer with features from another based on spatial relationship (overlap, containment, proximity). Assigns attributes across layers.
  • Proximity analysis: Finding nearest features, distances, or connectivity. Foundation for network analysis, market analysis, and accessibility studies.

Session Setup

library(whiteboxworkflows)

s <- wbw_session()
setwd('/data/vector')

Reading and Writing Vectors

polys  <- wbw_read_vector('parcels.shp')
lines  <- wbw_read_vector('roads.shp')
points <- wbw_read_vector('samples.shp')

meta <- polys$metadata()
cat('Feature count:', meta$num_features, '\n')
cat('Geometry type:', meta$geom_type, '\n')
cat('CRS:', meta$wkt, '\n')

Attribute Management

# Add a new field
wbw_add_field(i         = polys$file_path(),
  output    = 'parcels_v2.shp',
  field_name = 'AREA_HA',
  field_type = 'Float')

# Delete an unwanted field
wbw_delete_field(i          = 'parcels_v2.shp',
  output     = 'parcels_v3.shp',
  field_name = 'OLD_FIELD')

# Rename a field
wbw_rename_field(i             = 'parcels_v3.shp',
  output        = 'parcels_v4.shp',
  input_field   = 'AREA_HA',
  output_field  = 'HECTARES')

# Extract features by attribute value
wbw_extract_by_attribute(i            = polys$file_path(),
  output       = 'large_parcels.shp',
  field        = 'AREA_M2',
  filter_value = 10000.0,
  filter_type  = 'Greater Than')

Geometric Measurements

# Add area, perimeter, and basic shape metrics to polygon attribute table
wbw_add_polygon_coordinates_to_table(i      = polys$file_path(),
  output = 'parcels_geom.shp')

# Polygon shape index — compactness
wbw_compactness_ratio(i = polys$file_path(), output = 'parcels_compact.shp')

wbw_elongation_ratio(i = polys$file_path(), output = 'parcels_elong.shp')

wbw_related_circumscribing_circle(i = polys$file_path(), output = 'parcels_rcc.shp')

wbw_patch_orientation(i = polys$file_path(), output = 'parcels_orient.shp')

wbw_radius_of_gyration(i = polys$file_path(), output = 'parcels_rog.shp')

Geometric Operations

# Centroids
wbw_centroid_vector(i = polys$file_path(), output = 'centroids.shp')

# Convex hull
wbw_convex_hull(i = polys$file_path(), output = 'convex_hulls.shp')

# Minimum bounding envelopes
wbw_minimum_bounding_envelope(i = polys$file_path(), output = 'mbe.shp')

# Smooth vector polygons
wbw_smooth_vectors(i = polys$file_path(), output = 'parcels_smooth.shp', filter = 5)

# Simplify features (Douglas-Peucker)
wbw_simplify_line_or_polygon(i = polys$file_path(), output = 'parcels_simplified.shp',
  dist = 5.0, remove_spurs = TRUE, errors_only = FALSE)

# Dissolve polygons on field value
wbw_dissolve(i = polys$file_path(), output = 'parcels_dissolved.shp',
  field = 'LAND_USE', snap_tol = 0.001)

Spatial Overlay

# Clip
wbw_clip(i        = polys$file_path(),
  clip     = 'study_area.shp',
  output   = 'clipped.shp',
  snap_tol = 0.001)

# Intersect
wbw_intersect(i        = polys$file_path(),
  overlay  = 'zones.shp',
  output   = 'intersection.shp',
  snap_tol = 0.001)

# Erase
wbw_erase(i        = polys$file_path(),
  erase    = 'exclusion_areas.shp',
  output   = 'erased.shp',
  snap_tol = 0.001)

# Union
wbw_union(i        = polys$file_path(),
  overlay  = 'other_layer.shp',
  output   = 'union.shp',
  snap_tol = 0.001)

# Symmetrical difference
wbw_symmetrical_difference(i        = polys$file_path(),
  overlay  = 'other_layer.shp',
  output   = 'symdiff.shp',
  snap_tol = 0.001)

Proximity Analysis

# Euclidean distance from vector features
wbw_vector_points_to_raster(i = points$file_path(), output = 'points.tif', field = 'FID',
  assign = 'last', nodata = TRUE, cell_size = 5.0, base = 'dem.tif')
wbw_euclidean_distance(i = 'points.tif', output = 'euclidean_dist.tif')

# Voronoi diagram
wbw_voronoi_diagram(i = points$file_path(), output = 'voronoi.shp')

Select by Location

wbw_select_by_location(input   = polys$file_path(),
  select  = 'stream_buffer.shp',
  output  = 'parcels_near_streams.shp',
  condition = 'within')

Spatial Join

wbw_spatial_join(target  = points$file_path(),
  join    = polys$file_path(),
  output  = 'points_joined.shp',
  condition = 'within',
  attr    = 'first')

Aggregation Strategies

# Join and aggregate field values from nearest polygon
for (stat in c('count', 'sum', 'mean', 'min', 'max')) {
  wbw_spatial_join(target    = 'zones.shp',
    join      = 'observations.shp',
    output    = paste0('zones_', stat, '.shp'),
    condition = 'contains',
    attr      = stat)
}

Vector-to-Raster Conversion

# Rasterize polygon layer
wbw_vector_polygons_to_raster(i = polys$file_path(), output = 'parcels_raster.tif',
  field = 'LAND_USE_ID', nodata = TRUE, cell_size = 5.0, base = 'dem.tif')

# Rasterize line layer
wbw_vector_lines_to_raster(i = lines$file_path(), output = 'roads_raster.tif',
  field = 'FID', nodata = TRUE, cell_size = 5.0, base = 'dem.tif')

# Rasterize points
wbw_vector_points_to_raster(i = points$file_path(), output = 'points_raster.tif',
  field = 'VALUE', assign = 'max', nodata = TRUE, cell_size = 5.0)

Field Calculator

# Compute area in hectares and write to existing field
wbw_field_calculator(i         = polys$file_path(),
  output    = 'parcels_calc.shp',
  field_name = 'AREA_HA',
  py_statement = '@Area / 10000.0',
  analyse   = FALSE)

Point Cluster Analysis

# Kernel density estimation (heat map)
wbw_kernel_density_estimation(i         = points$file_path(),
  output    = 'heatmap.tif',
  bandwidth = 200.0,
  kernel_type = 'quartic',
  cell_size = 5.0,
  base      = 'dem.tif')

# Hexagonal binning
wbw_create_hexagonal_vector_grid(i = 'study_area.shp', output = 'hex_grid.shp', width = 500.0, orientation = 'horizontal')
wbw_spatial_join(target = 'hex_grid.shp', join = points$file_path(),
  output = 'hex_counts.shp', condition = 'contains', attr = 'count')

WbW-Pro Spotlight: Market Access and Site Intelligence

  • Problem: Rank candidate sites using repeatable network-access and demand logic.
  • Tool: market_access_and_site_intelligence_workflow
  • Typical inputs: Network, existing sites, candidate sites, demand surface, drive-time rings.
  • Typical outputs: Catchment polygons, competitive-overlap layer, candidate-ranking CSV, executive summary JSON.
result <- s$run_tool(
  'market_access_and_site_intelligence_workflow',
  list(
    network                 = 'street_network.shp',
    sites_existing          = 'existing_sites.shp',
    sites_candidates        = 'candidate_sites.shp',
    demand_surface          = 'demand_points.shp',
    ring_costs              = c(5.0, 10.0, 15.0),
    catchments_output       = 'candidate_catchments.shp',
    overlap_analysis_output = 'competitive_overlap.shp',
    candidate_rank_csv      = 'candidate_rankings.csv',
    executive_summary_json  = 'market_summary.json'
  )
)

print(result)

Note: This workflow requires a session initialized with a valid Pro licence.


Complete Vector Analysis Workflow

library(whiteboxworkflows)

s <- wbw_session()
setwd('/data/vector_workflow')

parcels <- wbw_read_vector('parcels.shp')
study   <- wbw_read_vector('study_area.shp')
streams <- wbw_read_vector('streams.shp')

# 1. Clip to study area
wbw_clip(i = parcels$file_path(), clip = study$file_path(),
  output = 'parcels_clipped.shp', snap_tol = 0.001)

# 2. Add shape metrics
wbw_compactness_ratio(i = 'parcels_clipped.shp', output = 'parcels_shape.shp')

# 3. Buffer streams and intersect with parcels
wbw_buffer_raster(i = streams$file_path(), output = 'stream_buf.shp',
  size = 30.0, gridcells = FALSE)
wbw_intersect(i = 'parcels_shape.shp', overlay = 'stream_buf.shp',
  output = 'riparian_parcels.shp', snap_tol = 0.001)

# 4. Dissolve by land-use class
wbw_dissolve(i = 'riparian_parcels.shp', output = 'riparian_dissolved.shp',
  field = 'LAND_USE', snap_tol = 0.001)

# 5. Rasterize result
wbw_vector_polygons_to_raster(i = 'riparian_dissolved.shp', output = 'riparian.tif',
  field = 'LAND_USE_ID', nodata = TRUE, cell_size = 5.0)

cat('Vector analysis complete.\n')

Tips

  • Always validate topology before analysis: Run check_vector_topology() to detect overshoots, undershoots, self-intersections, and sliver polygons. Topological errors propagate through overlay and spatial join operations.
  • Build spatial indices on large layers: Large datasets (> 10,000 features) benefit from spatial indexing. Use build_spatial_index() explicitly before repeated spatial queries; operations like containment or proximity are fast with indices.
  • Choose your overlay operation carefully: Union retains all boundaries and combines attributes (can create many small slivers). Intersection keeps only overlapping regions. Difference retains Polygon A minus Polygon B. Test on small subsets first.
  • Dissolve reduces feature count and file size: After overlay, dissolve by ownership or category to collapse unnecessary edges. Dissolved layers render faster and are cleaner for publication.
  • Spatial joins are sensitive to alignment: Ensure both input layers use the same CRS and are free of topology errors. Reproject to equal-area projection before computing buffer distances or areas for analysis.
  • Buffer distance and units matter: Buffer distances are in map units (meters, feet, degrees). Use an equal-area projection if precise areas or distances are critical. Negative buffers can collapse small polygons (inset); test with small buffer values first.
  • Attribute table size is a memory constraint: Attribute tables with millions of rows and dozens of fields consume RAM. Export to CSV or database for large tables; work with summaries or samples when memory is limited.
  • Point-in-polygon operations scale with complexity: Containment tests are O(n) per point; on large datasets (> 1 million points), consider spatial index binning or vector-to-raster conversion for speed.

Online Data Downloads

This chapter covers downloading vector data directly from online providers into WbW-R workflows. The first provider is OpenStreetMap (OSM) through the Overpass API using tool ID download_osm_vector.

Scope and Current Provider

Current online-data tool:

  • download_osm_vector

In R, use wbw_run_tool(...) with explicit argument lists.

Quick Start

library(whiteboxworkflows)

session <- wbw_session()

result <- wbw_run_tool(
  "download_osm_vector",
  args = list(
    west = -80.54,
    south = 43.41,
    east = -80.47,
    north = 43.47,
    filter_preset = "roads",
    include_points = FALSE,
    include_lines = TRUE,
    include_polygons = FALSE,
    output = "kitchener_roads.geojson"
  ),
  session = session
)

print(result$outputs$path)

Presets and Filters

Preset classes:

  • all
  • roads
  • buildings
  • water
  • landuse
  • trails
  • parks
  • rail
  • amenities
  • boundaries
  • transit
  • poi

Optional custom filters:

  • filter_key = "amenity"
  • filter_key_value = "amenity=school"

When custom filters are provided, they are used instead of preset filtering.

Geometry Controls

Use geometry toggles to reduce payload and output complexity:

  • include_points
  • include_lines
  • include_polygons

Examples:

  • Streets only: points off, lines on, polygons off
  • Building footprints: points off, lines off, polygons on

Phase 2 options:

  • split_output_by_geometry = TRUE writes separate files with _points, _lines, and _polygons suffixes.

Caching and Provenance

Use optional caching when running repeated AOI/filter requests:

  • cache_dir = ".wbw_cache/osm"
  • cache_ttl_hours = 24 (set 0 to disable TTL checks)

Use provenance_output to write a JSON sidecar with endpoint, bbox, filters, feature counts, and cache usage metadata.

wbw_run_tool(
  "download_osm_vector",
  args = list(
    west = -80.54,
    south = 43.41,
    east = -80.47,
    north = 43.47,
    filter_preset = "trails",
    include_points = FALSE,
    include_lines = TRUE,
    include_polygons = FALSE,
    split_output_by_geometry = TRUE,
    cache_dir = ".wbw_cache/osm",
    cache_ttl_hours = 24,
    provenance_output = "kitchener_trails_provenance.json",
    output = "kitchener_trails.geojson"
  ),
  session = session
)

Projection and Output

Rules:

  • Query extent is in EPSG:4326 (longitude/latitude).
  • Set input_extent_epsg to provide west/south/east/north in another CRS (the bbox is transformed to EPSG:4326 before querying Overpass).
  • Use output_epsg to reproject output.
  • Output format is inferred from output extension.

Endpoint selection:

  • overpass_profile supports: main, kumi, fr, custom.
  • overpass_url overrides the selected profile URL when provided.

Large-AOI chunking:

  • chunk_large_aoi = TRUE (default) automatically tiles large query extents.
  • chunk_max_area_deg2 = 4.0 controls maximum area per chunk.
  • max_chunk_count = 64 caps the number of generated chunk requests.
  • chunk_parallel_requests = 1 (default) controls bounded parallel chunk fetch; set >1 to fetch chunks concurrently.
wbw_run_tool(
  "download_osm_vector",
  args = list(
    west = -80.54,
    south = 43.41,
    east = -80.47,
    north = 43.47,
    filter_preset = "buildings",
    include_points = FALSE,
    include_lines = FALSE,
    include_polygons = TRUE,
    output_epsg = 32617,
    output = "kitchener_buildings_utm17n.gpkg"
  ),
  session = session
)

Operational Guidance

Overpass public endpoints have rate limits and shared infrastructure constraints.

Recommended practice:

  • Use smaller AOIs
  • Prefer thematic filters
  • Cap responses with max_elements
  • Increase timeout_seconds only when needed

Attribution and Licensing

OSM data are licensed under ODbL. Ensure appropriate OpenStreetMap attribution and verify obligations for redistributed outputs.

Companion Example

See:

  • crates/wbw_r/examples/osm_download_vector.R

Network Analysis

Whitebox Next Gen R brings the same deep network-analysis engine as the Python API: topology auditing, point-to-point routing, service areas, closest facility, OD cost matrices, location-allocation, accessibility metrics, sensitivity analysis, multimodal transit modelling, map matching, and fleet dispatch optimization. This chapter walks through those capabilities in order, mirroring the prepare-then-query-then-scale-up workflow used in practice.


Session Setup

library(whiteboxworkflows)

s <- wbw_session()
setwd('/data/network')

Core Concepts You Should Know First

Before running tools, it helps to align on a few core terms used throughout this chapter.

  • Network: A graph made of edges (road or transit segments) and nodes (intersections, stops, junctions).
  • Cost / impedance: The value minimized during routing. This can be distance, travel time, generalized cost, or another friction metric.
  • Origin / destination (OD): Origins are trip start points; destinations are trip end points.
  • OD matrix: A table of costs from many origins to many destinations. This is the standard structure for accessibility, market access, and assignment analyses.
  • Shortest path: The minimum-cost path between one origin and one destination.
  • K-shortest paths: The best k distinct alternatives between the same OD pair, useful for resilience and choice modelling.
  • Service area (isochrone): The portion of the network reachable from an origin within a cost threshold (for example 10 minutes).
  • Closest facility: For each incident or demand point, the least-cost route to the nearest candidate facility on the network.
  • Location-allocation: Selecting facility sites that optimize a demand objective, such as minimizing total travel cost or maximizing coverage.
  • Connectivity: Whether all required origins and destinations are in the same connected component. Disconnected components cause failed routes.
  • Node degree: The number of edges touching a node. Degree supports basic network centrality interpretation and QA for odd junction structure.
  • Multimodal routing: Pathfinding across multiple transport modes (walk/bus/rail) with transfer penalties and mode constraints.
  • Map matching: Snapping GPS trajectories to the most plausible sequence of network edges.

If you keep these definitions in mind, each workflow step below becomes easier to interpret and validate.


Step 1 — Prepare and Audit the Network

Every routing workflow should begin with a topology check.

Topology Audit

network_topology_audit scans a line network for dead ends, pseudo-nodes, overshoots, and isolated islands, writing each flagged location as a point feature and optionally producing a text report.

wbw_network_topology_audit(i             = 'roads.shp',
  output        = 'topology_errors.shp',
  snap_tolerance = 0.5,
  one_way_field  = 'ONEWAY',
  report         = 'topology_report.txt')

Review topology_report.txt to understand the error count and class distribution before continuing.

Connected Components

network_connected_components assigns a component identifier to every edge so you can identify and resolve disconnected islands before running OD queries.

wbw_network_connected_components(i              = 'roads.shp',
  output         = 'roads_components.shp',
  snap_tolerance = 0.5)
# Edges not in the dominant component are candidates for removal or bridging.

Node Degree

network_node_degree writes the degree of every node as a point layer. Degree-1 nodes are dead ends; unusually high-degree nodes may be duplicates.

wbw_network_node_degree(i              = 'roads.shp',
  output         = 'node_degree.shp',
  snap_tolerance = 0.5)

Step 2 — Shortest Path and Alternatives

Single Shortest Path

shortest_path_network finds the minimum-cost path between two coordinates using Dijkstra's algorithm. Supply edge_cost_field for travel-time routing; omit it to route by Euclidean arc length.

wbw_shortest_path_network(i               = 'roads.shp',
  output          = 'route_shortest.shp',
  start_x         = 454230.0,
  start_y         = 4823150.0,
  end_x           = 458900.0,
  end_y           = 4819700.0,
  snap_tolerance  = 20.0,
  edge_cost_field = 'MINUTES',
  one_way_field   = 'ONEWAY')

Turn penalties model the real cost of left, right, and U-turns in dense urban networks.

wbw_shortest_path_network(i               = 'roads.shp',
  output          = 'route_with_turns.shp',
  start_x         = 454230.0,
  start_y         = 4823150.0,
  end_x           = 458900.0,
  end_y           = 4819700.0,
  snap_tolerance  = 20.0,
  edge_cost_field = 'MINUTES',
  one_way_field   = 'ONEWAY',
  turn_penalty    = 0.5,
  u_turn_penalty  = 3.0,
  forbid_u_turns  = TRUE)

K-Shortest Alternative Paths

k_shortest_paths_network returns the k least-cost distinct paths for the same endpoints — useful for resilience analysis and presenting alternatives to planners.

wbw_k_shortest_paths_network(i               = 'roads.shp',
  output          = 'routes_k3.shp',
  start_x         = 454230.0,
  start_y         = 4823150.0,
  end_x           = 458900.0,
  end_y           = 4819700.0,
  k               = 3L,
  snap_tolerance  = 20.0,
  edge_cost_field = 'MINUTES',
  one_way_field   = 'ONEWAY')
# Each feature carries a PATH_RANK attribute (1 = shortest).

Step 3 — Service Areas

network_service_area delineates every part of the network reachable within a cost threshold from one or more origins. Typical uses include drive-time catchments for emergency services, walking isochrones around transit stops, and delivery zones.

wbw_network_service_area(i                    = 'roads.shp',
  origins              = 'fire_stations.shp',
  output               = 'fire_catchment_5min.shp',
  max_cost             = 5.0,
  snap_tolerance       = 20.0,
  output_mode          = 'polygon',
  polygon_merge_origins = TRUE,
  edge_cost_field      = 'MINUTES',
  one_way_field        = 'ONEWAY')

Use output_mode = 'edges' to retain actual road arcs inside the catchment rather than fill a polygon — more appropriate when the network is sparse.


Step 4 — Closest Facility

closest_facility_network routes each incident point to its nearest facility, measuring cost along the network rather than in straight-line distance. This is the core tool for emergency-response siting, healthcare access studies, and school-catchment delineation.

wbw_closest_facility_network(i               = 'roads.shp',
  incidents       = 'accidents.shp',
  facilities      = 'hospitals.shp',
  output          = 'routes_to_hospital.shp',
  snap_tolerance  = 20.0,
  edge_cost_field = 'MINUTES',
  one_way_field   = 'ONEWAY')
# Output carries INCIDENT_FID, FACILITY_FID, and COST fields per route.

Step 5 — OD Cost Matrix and Batch Route Export

OD Cost Matrix

network_od_cost_matrix solves all pairwise paths and writes results to a CSV. Each row contains an origin identifier, a destination identifier, and the network cost between them.

wbw_network_od_cost_matrix(i               = 'roads.shp',
  origins         = 'schools.shp',
  destinations    = 'libraries.shp',
  output          = 'od_costs.csv',
  snap_tolerance  = 20.0,
  edge_cost_field = 'MINUTES',
  one_way_field   = 'ONEWAY')

od <- read.csv('od_costs.csv')
# Minimum travel time from each school to any library:
print(aggregate(COST ~ ORIGIN_FID, data = od, FUN = min))

Materializing OD Routes as Geometry

network_routes_from_od creates the actual path lines between OD pairs.

wbw_network_routes_from_od(i               = 'roads.shp',
  origins         = 'schools.shp',
  destinations    = 'libraries.shp',
  output          = 'od_routes.shp',
  snap_tolerance  = 20.0,
  edge_cost_field = 'MINUTES',
  one_way_field   = 'ONEWAY')

Step 6 — Location-Allocation

location_allocation_network solves the p-median problem: given candidate facility locations and weighted demand points, which p facilities minimise total travel cost? Directly supports clinic siting, school consolidation, and warehouse network design.

wbw_location_allocation_network(i                    = 'roads.shp',
  demand_points        = 'demand_points.shp',
  facilities           = 'candidate_sites.shp',
  output               = 'selected_facilities.shp',
  facility_count       = 4L,
  solver_mode          = 'minimize_impedance',
  demand_weight_field  = 'POP',
  snap_tolerance       = 20.0,
  edge_cost_field      = 'MINUTES')
# SELECTED == 1 on the four chosen candidate sites.
# Demand points carry ASSIGNED_FID linking each to its nearest selected site.

Supported solver modes: minimize_impedance (p-median), maximize_coverage, and maximize_attendance.


Step 7 — Network Accessibility Metrics

compute_network_accessibility() is exposed as a typed facade wrapper. It computes a gravity-model or cumulative-opportunity accessibility score per origin — a standard indicator in transport equity analysis.

residents    <- wbw_read_vector('resident_centroids.shp', session = s)
supermarkets <- wbw_read_vector('supermarkets.shp', session = s)

result <- s$compute_network_accessibility(
  input              = 'roads.shp',
  origins            = residents$file_path(),
  destinations       = supermarkets$file_path(),
  output             = 'food_accessibility.shp',
  edge_cost_field    = 'MINUTES',
  impedance_cutoff   = 30.0,
  decay_function     = 'negative_exponential',
  decay_parameter    = 0.1
)
# Each origin point carries an ACCESS_SCORE field.

Step 8 — OD Sensitivity Analysis

analyze_od_cost_sensitivity() quantifies how stable OD costs are under stochastic perturbation of edge weights — useful for stress-testing a routing model against travel-time uncertainty or hypothetical road-closure scenarios.

result <- s$analyze_od_cost_sensitivity(
  input                       = 'roads.shp',
  origins                     = 'schools.shp',
  destinations                = 'libraries.shp',
  output                      = 'od_sensitivity.shp',
  edge_cost_field             = 'MINUTES',
  impedance_disturbance_range = 0.2,  # ±20 % perturbation
  monte_carlo_samples         = 500L
)

Step 9 — Multimodal Analysis

Whitebox Next Gen supports networks with a MODE field on each edge (e.g. walk, cycle, bus, rail). Multimodal tools honour mode-specific speeds, transfer penalties, and time-of-day profiles.

Multimodal OD Scenarios

analyze_multimodal_od_scenarios() runs a batch of named scenarios from a CSV, each with different mode allowances, speed overrides, or departure times.

transit_net  <- wbw_read_vector('transit_network.shp', session = s)
bus_stops    <- wbw_read_vector('bus_stops.shp', session = s)
destinations <- wbw_read_vector('key_destinations.shp', session = s)

result <- s$analyze_multimodal_od_scenarios(
  input               = transit_net$file_path(),
  origins             = bus_stops$file_path(),
  destinations        = destinations$file_path(),
  output              = 'multimodal_od_scenarios.shp',
  mode_field          = 'MODE',
  allowed_modes       = 'walk,bus,rail',
  transfer_penalty    = 3.0,
  edge_cost_field     = 'MINUTES',
  scenario_bundle_csv = 'scenarios.csv',
  departure_time      = '08:00',
  temporal_mode       = 'scheduled'
)

Exporting Multimodal Route Geometry

export_multimodal_routes_for_od_pairs() materializes the optimal multimodal route for each OD pair with per-segment mode attributes.

result <- s$export_multimodal_routes_for_od_pairs(
  input            = transit_net$file_path(),
  origins          = bus_stops$file_path(),
  destinations     = destinations$file_path(),
  output           = 'multimodal_routes.shp',
  mode_field       = 'MODE',
  allowed_modes    = 'walk,bus,rail',
  transfer_penalty = 3.0,
  edge_cost_field  = 'MINUTES'
)

Step 10 — Map Matching

map_matching_v1 snaps a raw GPS trajectory to the most plausible sequence of network edges using a hidden Markov model with candidate expansion. It is the first step in any floating-vehicle data or probe-data workflow.

wbw_map_matching_v1(i                     = 'roads.shp',
  trajectory_points     = 'gps_probe_points.shp',
  timestamp_field       = 'TIMESTAMP',
  output                = 'matched_route.shp',
  search_radius         = 30.0,
  candidate_k           = 5L,
  snap_tolerance        = 10.0,
  edge_cost_field       = 'MINUTES',
  matched_points_output = 'matched_points.shp',
  match_report          = 'match_report.txt')

Step 11 — Fleet and Vehicle Routing (Pro)

fleet_routing_and_dispatch_optimizer solves CVRP and VRPTW problems: given a fleet of vehicles and a set of service or delivery stops, it assigns and sequences routes to minimise total cost subject to capacity and time-window constraints.

result <- s$run_tool(
  'fleet_routing_and_dispatch_optimizer',
  list(
    network               = 'roads.shp',
    depots                = 'depots.shp',
    stops                 = 'delivery_stops.shp',
    vehicles_csv          = 'fleet.csv',
    route_output          = 'fleet_routes.shp',
    route_kpis_csv_output = 'fleet_kpis.csv',
    edge_cost_field       = 'MINUTES',
    one_way_field         = 'ONEWAY',
    vrp_mode              = 'VRPTW'
  )
)

kpis <- read.csv('fleet_kpis.csv')
print(kpis)

Note: This tool requires a session initialised with a valid Pro licence.


Complete Workflow: Emergency Response Planning

library(whiteboxworkflows)

s <- wbw_session()
setwd('/data/emergency_planning')

# 1. Audit topology.
wbw_network_topology_audit(i = 'roads.shp', output = 'topology_errors.shp',
  snap_tolerance = 0.5, report = 'topology_report.txt')

# 2. Five-minute drive catchments from existing stations.
wbw_network_service_area(i                    = 'roads.shp',
  origins              = 'fire_stations.shp',
  output               = 'existing_catchment_5min.shp',
  max_cost             = 5.0,
  output_mode          = 'polygon',
  polygon_merge_origins = TRUE,
  edge_cost_field      = 'MINUTES',
  snap_tolerance       = 20.0)

# 3. Route historical incidents to nearest existing station.
wbw_closest_facility_network(i               = 'roads.shp',
  incidents       = 'historical_incidents.shp',
  facilities      = 'fire_stations.shp',
  output          = 'incident_routes.shp',
  snap_tolerance  = 20.0,
  edge_cost_field = 'MINUTES')

# 4. Find two new station sites that maximise coverage.
wbw_location_allocation_network(i               = 'roads.shp',
  demand_points   = 'historical_incidents.shp',
  facilities      = 'candidate_stations.shp',
  output          = 'new_station_sites.shp',
  facility_count  = 2L,
  solver_mode     = 'maximize_coverage',
  snap_tolerance  = 20.0,
  edge_cost_field = 'MINUTES')

Tips

  • Always run network_topology_audit first — even one disconnected segment can cause a path query to return no result without an explicit error.
  • Use network_connected_components to confirm all origins and destinations belong to the same component before running OD queries.
  • Supply edge_cost_field pointing to a travel-time column for realistic routing; omit it only for pure geometric distance problems.
  • For scheduled transit, pass temporal_cost_profile and departure_time to load time-of-day speeds.
  • The fleet_routing_and_dispatch_optimizer Pro tool requires a session initialised with a valid Pro licence.

Linear Referencing

Linear referencing in WbW-R is the practice of locating events, measurements, or observations along a route using a distance-based measure rather than absolute coordinates. A road pavement database records a pothole at 2.4 km along route R-12; a pipeline corridor flags an anomaly at 847 m along a trunk line. Whitebox Next Gen provides the tools to locate observations onto measured routes, place events from tabular or spatial sources, and — with a Pro licence — audit the consistency of large linear asset datasets.


Session Setup

library(whiteboxworkflows)

s <- wbw_session()
setwd('/data/linear_referencing')

Core Concepts

A linear-referencing workflow has three parts:

  1. Routes — line features defining the measurement axis. Each route has a unique identifier and M-values (cumulative distance from its start).
  2. Measures — the distance value used to locate a position along a route.
  3. Events — point or line observations located by (route_id, measure) or (route_id, from_measure, to_measure) pairs.

Common applications include road-pavement condition assessment, pipeline integrity monitoring, trail difficulty reporting, environmental transect sampling, and GPS probe data quality control.


Step 1 — Understand Your Route Geometry

Routes must be single-part polylines with a consistent digitizing direction. Before dropping events, confirm:

  • Each route has a unique identifier stored in a field (e.g. ROUTE_ID).
  • No route self-intersects.
  • Routes forming a corridor are merged into one feature per identifier.

Use snap_endnodes and merge_line_segments via wbw_run_tool to clean ragged street-centreline inputs before treating them as routes.


Step 2 — Locate Points Along Routes

locate_points_along_routes takes an existing point layer and finds the nearest position on each matching route, writing back the M-value (measure) for every point. Use this when field teams have collected GPS observation points and you need to convert them to route-distance offsets.

wbw_locate_points_along_routes(routes               = 'roads_measured.shp',
  points               = 'field_observations.shp',
  output               = 'observations_located.shp',
  max_offset_distance  = 15.0)
# Output adds ROUTE_ID, MEASURE, and OFFSET fields to every input point.

The MEASURE field is the along-route distance from the route start. OFFSET is the perpendicular snap distance. Points beyond max_offset_distance are excluded from the output.


Step 3 — Place Events from a Table

Point Events

route_event_points_from_table reads a CSV where each row specifies a route identifier and a measure value, and places a point feature at that position. This is the standard import path for lab results, inspection records, or maintenance logs stored in external databases.

# pavement_defects.csv columns: ROUTE_ID, MEASURE, SEVERITY, NOTES
wbw_route_event_points_from_table(routes             = 'roads_measured.shp',
  events             = 'pavement_defects.csv',
  event_route_field  = 'ROUTE_ID',
  measure_field      = 'MEASURE',
  output             = 'pavement_defects_located.shp')

Line (Interval) Events

route_event_lines_from_table reads FROM_MEASURE and TO_MEASURE columns to produce line segments — useful for pavement condition ratings, speed zones, or any attribute that applies to a stretch of route rather than a single point.

# pavement_condition.csv columns: ROUTE_ID, FROM_MEASURE, TO_MEASURE, IRI, CONDITION
wbw_route_event_lines_from_table(routes             = 'roads_measured.shp',
  events             = 'pavement_condition.csv',
  event_route_field  = 'ROUTE_ID',
  from_measure_field = 'FROM_MEASURE',
  to_measure_field   = 'TO_MEASURE',
  output             = 'pavement_condition_segments.shp')

Step 4 — Place Events from a Spatial Layer

When your event data is already a vector layer rather than a plain table, use the _from_layer variants. These carry across all attributes of the source feature and can optionally write the original FID and XY into the output.

Point Events from a Layer

wbw_route_event_points_from_layer(routes             = 'roads_measured.shp',
  events             = 'manhole_inspections.shp',
  event_route_field  = 'ROUTE_ID',
  measure_field      = 'MEASURE',
  output             = 'manholes_on_routes.shp',
  write_event_fid    = TRUE,
  write_event_xy     = TRUE)

Line Events from a Layer

wbw_route_event_lines_from_layer(routes             = 'roads_measured.shp',
  events             = 'speed_zone_events.shp',
  event_route_field  = 'ROUTE_ID',
  from_measure_field = 'FROM_M',
  to_measure_field   = 'TO_M',
  output             = 'speed_zones_on_routes.shp',
  write_event_fid    = TRUE)

Step 5 — Linear Asset Governance (Pro)

route_event_governance_for_linear_assets audits a complete linear asset dataset for measure gaps, overlaps, duplicate events, orphaned route references, and out-of-range measures. It produces a flagged event output and a structured report — essential for maintaining the integrity of a production road or utility asset database.

result <- s$run_tool(
  'route_event_governance_for_linear_assets',
  list(
    routes             = 'roads_measured.shp',
    events             = 'pavement_condition.shp',
    route_id_field     = 'ROUTE_ID',
    from_measure_field = 'FROM_MEASURE',
    to_measure_field   = 'TO_MEASURE',
    flagged_output     = 'event_flags.shp',
    report             = 'governance_report.csv'
  )
)

flags <- read.csv('governance_report.csv')
print(table(flags$ERROR_CLASS))

Note: This tool requires a session initialised with a valid Pro licence.


Complete Workflow: Road Pavement Assessment

library(whiteboxworkflows)

s <- wbw_session()
setwd('/data/pavement_assessment')

# Step 1: Snap GPS observation points onto routes and extract M-values.
wbw_locate_points_along_routes(routes              = 'roads_measured.shp',
  points              = 'field_inspection_gps.shp',
  output              = 'gps_on_routes.shp',
  max_offset_distance = 10.0)

# Step 2: Place point defect records from the inspection database.
wbw_route_event_points_from_table(routes            = 'roads_measured.shp',
  events            = 'defect_records.csv',
  event_route_field = 'ROUTE_ID',
  measure_field     = 'MEASURE',
  output            = 'defects_located.shp')

# Step 3: Place condition rating intervals.
wbw_route_event_lines_from_table(routes             = 'roads_measured.shp',
  events             = 'condition_ratings.csv',
  event_route_field  = 'ROUTE_ID',
  from_measure_field = 'FROM_M',
  to_measure_field   = 'TO_M',
  output             = 'condition_segments.shp')

# Step 4 (Pro): Audit the condition layer for gaps and overlaps.
result <- s$run_tool(
  'route_event_governance_for_linear_assets',
  list(
    routes             = 'roads_measured.shp',
    events             = 'condition_segments.shp',
    route_id_field     = 'ROUTE_ID',
    from_measure_field = 'FROM_M',
    to_measure_field   = 'TO_M',
    flagged_output     = 'condition_flags.shp',
    report             = 'governance_report.csv'
  )
)
cat('Governance report:', 'governance_report.csv', '\n')

Tips

  • Routes must have a consistent digitizing direction. Run snap_endnodes and confirm that all segments in a route are digitized in the same direction before locating events.
  • locate_points_along_routes excludes points beyond max_offset_distance. Inspect unmatched points to identify GPS outliers or route coverage gaps.
  • Use route_event_points_from_table and route_event_lines_from_table for bulk imports from asset management databases where locations are already stored as route+measure pairs.
  • Use the _from_layer variants when existing vector event layers already carry route identifier and measure fields.
  • The route_event_governance_for_linear_assets Pro tool scales to production databases with millions of events and produces actionable error reports for integration into maintenance management systems.

Script Index

Use this index to map manual chapters to runnable examples.

Treat this chapter as the transition from concept to implementation. When building a new workflow, start from the closest example script and adapt it with your paths and parameters rather than composing from isolated snippets. This usually produces safer and more maintainable scripts.

Core Session and Discovery

  • crates/wbw_r/examples/generated_session_example.R - baseline session creation and discovery flow
  • crates/wbw_r/r-package/whiteboxworkflows/inst/examples/generated_session_example.R - package-scoped session bootstrap pattern
  • crates/wbw_r/r-package/whiteboxworkflows/inst/examples/golden_path_workflows.R - representative end-to-end workflow chain

Raster Workflows

  • crates/wbw_r/r-package/whiteboxworkflows/inst/examples/raster_object_quickstart.R - object lifecycle and metadata checks
  • crates/wbw_r/r-package/whiteboxworkflows/inst/examples/raster_array_roundtrip.R - array conversion and persistence roundtrip

Vector Workflows

  • crates/wbw_r/r-package/whiteboxworkflows/inst/examples/vector_object_quickstart.R - schema-first vector workflow baseline
  • crates/wbw_r/examples/vector_topojson_roundtrip.R - TopoJSON read/write roundtrip workflow
  • crates/wbw_r/examples/osm_download_vector.R - OpenStreetMap download workflow via Overpass API

Lidar Workflows

  • crates/wbw_r/r-package/whiteboxworkflows/inst/examples/lidar_object_quickstart.R - lidar object lifecycle and metadata checks
  • crates/wbw_r/r-package/whiteboxworkflows/inst/examples/lidar_chunked_matrix_streaming.R - chunked matrix processing at scale
  • crates/wbw_r/examples/lidar_write_options.R - output option tuning for lidar formats

Sensor Bundle Workflows

  • crates/wbw_r/r-package/whiteboxworkflows/inst/examples/sensor_bundle_quickstart.R - bundle intake and key inspection baseline
  • crates/wbw_r/r-package/whiteboxworkflows/inst/examples/sensor_bundle_multi_family_preview.R - cross-family preview pattern

Licensing

  • crates/wbw_r/examples/licensing_offline.R - local entitlement startup example
  • crates/wbw_r/examples/licensing_floating_online.R - floating license provider startup example