Working with Vectors

This chapter covers schema inspection, feature iteration, attribute reads/writes, and persistence workflows.

Vector processing is often more about data contracts than geometry mechanics. Schemas, field types, and attribute consistency determine whether downstream analysis remains trustworthy. The patterns below emphasize validating structure first, then applying deterministic edits, then persisting to stable interchange formats for downstream tools.

See Also: Online Sources

If you need to acquire vectors directly from web providers (starting with OSM Overpass), see the dedicated chapter:

Read and Inspect

This step establishes the schema contract your downstream edits depend on.

import whitebox_workflows as wb

wbe = wb.WbEnvironment()
roads = wbe.read_vector('roads.gpkg')

schema = roads.schema()
print(schema)
print('features:', roads.feature_count())

Memory-Backed Vectors for Pipeline Efficiency

For workflows that chain multiple vector operations, memory-backed vectors eliminate disk I/O between steps. This is valuable for complex pipelines where intermediate results are passed between spatial operations.

Load a vector into memory with file_mode='m':

import whitebox_workflows as wb

wbe = wb.WbEnvironment()

# Read directly into memory
roads = wbe.read_vector('roads.gpkg', file_mode='m')
rivers = wbe.read_vector('rivers.gpkg', file_mode='m')

print(roads.file_path)  # prints: memory://vector/...

Memory-backed vectors are compatible with all downstream operations:

import whitebox_workflows as wb

wbe = wb.WbEnvironment()
v = wbe.read_vector('polygons.gpkg', file_mode='m')

# Inspect schema and metadata
schema = v.schema()
meta = v.metadata()

# Pass to spatial tools
centroids = wbe.vector.geometry_processing.centroid_vector(v)

# Export to disk when ready
wbe.write_vector(centroids, 'centroids_final.gpkg')

Vector Memory Lifecycle

Memory-backed vectors persist until explicitly removed or cleared. For long-running vector pipelines, manage memory explicitly:

import whitebox_workflows as wb

wbe = wb.WbEnvironment()

# Check current memory
print(f"Vectors in memory: {wbe.vector_memory_count()}")

# Read vectors
v1 = wbe.read_vector('large1.gpkg', file_mode='m')
v2 = wbe.read_vector('large2.gpkg', file_mode='m')

print(f"After reads: {wbe.vector_memory_count()}")

# Remove when done
wbe.remove_vector_from_memory(v1)
print(f"After remove: {wbe.vector_memory_count()}")

# Or clear all
wbe.clear_vector_memory()
print(f"After clear: {wbe.vector_memory_count()}")

Implicit Memory Output from Tools

All vector-output tools store their result in memory automatically when the output parameter is omitted. You do not need to pass file_mode='m' or choose a temporary path — simply leave output out and the returned Vector object is already memory-backed:

import whitebox_workflows as wb

wbe = wb.WbEnvironment()
roads = wbe.read_vector('roads.gpkg')

# No output path — result is stored in memory automatically
centroids = wbe.vector.geometry_processing.centroid_vector(roads)
print(centroids.file_path)  # prints: memory://vector/...

# Chain operations without any intermediate files
clipped = wbe.vector.overlay_analysis.clip(centroids, 'boundary.gpkg')
print(clipped.file_path)  # also memory://vector/...

# Persist the final result only
wbe.write_vector(clipped, 'result.gpkg')

This applies to all tool categories — GIS, hydrology, geomorphometry, and stream network tools all follow the same rule. Providing an explicit output path writes to disk as before.

Best practices:

  • Use file_mode='m' for intermediate spatial analysis results.
  • Export memory-backed vectors to disk with write_vector() when persisting final outputs.
  • Call remove_vector_from_memory() after a vector is no longer needed.
  • Use clear_vector_memory() between independent analysis phases.
  • Use clear_memory() when resetting all in-process raster/vector/lidar stores together.

Iterate Through Features

Use feature iteration for inspections, QA checks, or bespoke attribute rules.

import whitebox_workflows as wb

wbe = wb.WbEnvironment()
v = wbe.read_vector('roads.gpkg')

n = v.feature_count()
for i in range(n):
    attrs = v.attributes(i)
    # attrs is dict-like; process values
    print(i, attrs)

Read and Update Attribute Table

This example demonstrates single-field updates, grouped updates, and schema extension in one controlled sequence.

import whitebox_workflows as wb

wbe = wb.WbEnvironment()
v = wbe.read_vector('roads.gpkg')

# Read one field value
name0 = v.attribute(0, 'name')
print('name[0]=', name0)

# Update one field
v.update_attribute(0, 'name', 'Main Street')

# Update multiple fields
v.update_attributes(1, {'speed': 50, 'class': 'collector'})

# Add a new field
v.add_field('reviewed', field_type='bool', default_value=False)

Persist Vector Outputs

This pattern shows both default extension behavior and explicit format control for reproducibility.

For complete write-option keys and allowed values, see Output Controls.

import whitebox_workflows as wb

wbe = wb.WbEnvironment()
roads = wbe.read_vector('roads.gpkg')
centroids = wbe.vector.geometry_processing.centroid_vector(roads)

# Extensionless output defaults to GeoPackage
wbe.write_vector(centroids, 'roads_centroids')

# Explicit output format
wbe.write_vector(buffered, 'roads_buffer.parquet', options={
    'strict_format_options': True,
    'geoparquet': {'compression': 'zstd'},
})

Practical Notes

  • Use schema() first to validate field names and types.
  • Prefer update_attributes() for grouped edits to a feature.
  • Re-read and validate after major writes, especially when switching formats.

Vector Object Method Reference

Common simple properties such as file_path and file_name are omitted here so the tables stay focused on callable Vector methods.

Schema and Attribute Access

MethodDescription
schemaReturn the vector schema, including field structure and geometry information.
feature_countReport how many features are present.
attribute_fields, attribute_field_namesInspect available attribute fields by full definition or by field name list.
attributeRead a single field value from one feature.
attributesRead all attribute values for one feature as a grouped record.
add_fieldAdd a new attribute field to the dataset schema.
update_attributeUpdate one field in one feature.
update_attributesUpdate multiple fields in one feature at once.

File, Metadata, and Copying

MethodDescription
metadataReturn VectorMetadata describing file state, CRS, and feature count.
absolute_pathResolve the vector to an absolute file path string.
parent_directoryReturn the containing directory path.
existsCheck whether the backing dataset exists on disk.
get_short_filename, get_file_extensionReturn convenience filename information.
get_file_size_in_bytes, get_last_modified_unix_secondsInspect filesystem metadata for reporting or audit logs.
deep_copyWrite a copied vector dataset to a derived or explicit output path.

CRS and Geometry-Safe Persistence

MethodDescription
crs_wkt, crs_epsgInspect CRS metadata as WKT text or EPSG code.
set_crs_wkt, set_crs_epsgAssign CRS metadata without moving feature coordinates.
clear_crsRemove CRS metadata so it can be assigned again explicitly.
reprojectReproject the vector dataset with explicit failure, topology, and antimeridian policies.