Schema management#

DuckPlus keeps schema validation front and centre so data pipelines remain predictable. Schemas are derived lazily from DuckDB itself, ensuring any new types introduced upstream are reflected automatically in your metadata.

Inspecting schemas#

DuckPlus surfaces schema metadata through the :mod:duckplus.schema module. At its simplest, you can inspect :attr:Relation.columns <duckplus.relation.Relation.columns> and :attr:Relation.types <duckplus.relation.Relation.types> directly on immutable relations. For richer reports, call :func:duckplus.schema.diff_relations or :func:duckplus.schema.diff_files to produce dataclasses that describe the differences between two sources.

columns = dict(zip(relation.columns, relation.types, strict=False))
for name, duck_type in columns.items():
    print(name, duck_type)

Diffing relations#

:func:duckplus.schema.diff_relations compares two relations and reports column additions, removals, and type changes. The resulting :class:duckplus.schema.SchemaDiff exposes missing_from_candidate, unexpected_in_candidate, and type_drift tuples, making it trivial to render change reports or emit warnings. A similar interface exists for files via :func:duckplus.schema.diff_files—pass CSV, Parquet, or JSON locations and let DuckPlus load the data for you.

from duckplus import schema as schema_utils

report = schema_utils.diff_relations(relation, other_relation)
print(report.missing_from_candidate)
print(report.type_drift)

Exporters#

Sampling helpers convert relations into in-memory data structures without breaking immutability:

  • :meth:Relation.sample_arrow <duckplus.relation.Relation.sample_arrow>

  • :meth:Relation.sample_polars <duckplus.relation.Relation.sample_polars>

  • :meth:Relation.sample_pandas <duckplus.relation.Relation.sample_pandas>

Each helper validates that the source relation still has access to an open connection, surfacing immediate feedback when the manager is closed.

Table utilities#

The :mod:duckplus.table module manages DuckDB tables with schema validation and idempotent insert helpers. It reuses the metadata cached on relations to ensure columns align before writes run.

Automating checks in CI#

Schema helpers shine in automated test suites. Pair schema_utils.diff_relations(relation, other) with pytest assertions to guarantee staging datasets remain compatible with production tables. Because diffs return regular Python dataclasses, you can snapshot the output with tools like pytest-approvaltests or pytest-regressions.

Capturing documentation#

Use report.type_drift (where report is the result of diff_relations) to render documentation-friendly tables that stay synchronised with the actual DuckDB definitions. Because the results are regular dataclasses, they slot neatly into static site generators and Markdown renderers.