Querying local files with DuckPlus#
The DuckDB demo begins by scanning CSV and Parquet files from disk. DuckPlus
offers the same experience while returning immutable :class:~duckplus.relation.Relation
wrappers, which makes follow-on transformations safe to compose.
CSV walkthrough#
from pathlib import Path
from duckplus import DuckCon
from duckplus import io as duckio
from duckplus import ducktype
manager = DuckCon()
with manager:
circles = duckio.read_csv(
manager,
Path("data/circles.csv"),
header=True,
auto_detect=True,
)
radius = ducktype.Numeric("radius")
derived = circles.add(
diameter=radius * 2,
area=radius.pow(2) * 3.14159,
)
print(derived.relation.limit(3).fetchall())
The helper mirrors DuckDB’s read_csv signature, so the options showcased in
the demo—auto_detect, sample_size, names, and na_values—translate
directly to keyword arguments. DuckPlus validates conflicting aliases (for
example, providing both delimiter and delim) before the query executes.
Parquet walkthrough#
from duckplus import DuckCon
from duckplus import io as duckio
from duckplus import ducktype
manager = DuckCon()
with manager:
trips = duckio.read_parquet(
manager,
"data/yellow_tripdata_2023-01.parquet",
hive_partitioning=True,
)
summary = (
trips.aggregate()
.start_agg()
.agg(ducktype.Numeric("total_amount").sum(), alias="total_fare")
.agg(ducktype.Numeric("trip_distance").avg(), alias="avg_distance")
.by("passenger_count")
)
print(summary.relation.order("passenger_count").limit(5).fetchall())
DuckPlus resolves DuckDB’s Parquet reader options in the same way as the demo,
including binary_as_string and file_row_number. Because relations are
immutable, repeated aggregations and filters never mutate the original file
scan, preserving the safety guarantees emphasised in the earlier release guides.