File and table readers#

DuckPlus mirrors DuckDB’s file readers while integrating with :class:~duckplus.duckcon.DuckCon. Each helper expects an open manager and returns an immutable :class:~duckplus.relation.Relation with cached metadata. The functions—covering file readers and extension-backed connectors—live in :mod:duckplus.io and register automatically on every DuckCon instance, so you can call them directly from the connection manager without importing the module. They intentionally avoid **kwargs so editor completions surface every DuckDB option.

from pathlib import Path

from duckplus import DuckCon

manager = DuckCon()
with manager:
    relation = manager.read_csv(Path("data.csv"), header=True)
    print(relation.columns)

Because the helpers register automatically, persisting results is just as easy when chaining to the relation-level writers:

with manager:
    relation = manager.read_parquet(Path("data.parquet"))
    relation.append_csv(Path("report.csv"))
    relation.write_parquet_dataset(
        Path("dataset"),
        partition_column="country",
    )

CSV reader#

:meth:duckplus.io.read_csv exposes DuckDB’s table-function keywords without using **kwargs so IDEs surface every option. Aliases such as delim and quote match DuckDB’s own names, and DuckPlus raises a descriptive error when both the canonical and alias form are supplied for the same argument.

Key options include:

  • columns and dtype for explicit column typing.

  • names and na_values to override column names and null sentinels.

  • filename=True to append the absolute path of each input file.

with manager:
    relation = manager.read_csv(
        Path("transactions.csv"),
        delimiter="|",
        header=True,
        na_values=["NA", ""],
        filename=True,
    )

Pass lazy=True to stream large CSVs lazily—DuckPlus will propagate the parameter to DuckDB, allowing you to chain transformations before triggering materialisation.

Parquet reader#

:meth:duckplus.io.read_parquet mirrors DuckDB’s keyword arguments, including union_by_name, filename, and hive_partitioning. Passing a directory with directory=True loads all *.parquet files by default.

with manager:
    relation = manager.read_parquet(
        [
            Path("/data/sales_2024.parquet"),
            Path("/data/sales_2025.parquet"),
        ],
        union_by_name=True,
        filename=True,
    )

JSON, Arrow, and database connectors#

The IO module extends to DuckDB’s JSON readers, Arrow integration, and community-extension backed connectors like Excel and nano-ODBC. Each helper keeps parameters explicit so scripts remain self-documenting. Highlights include:

  • :func:duckplus.io.read_json for line-delimited JSON, with maximum_depth and format options mirroring DuckDB’s table function.

  • :func:duckplus.io.read_arrow for zero-copy reads from Arrow datasets or pyarrow.dataset.Dataset objects.

  • :func:duckplus.io.read_odbc_query and :func:duckplus.io.read_odbc_table for nano-ODBC queries and scans.

  • :func:duckplus.io.read_excel for Excel workbooks, which will automatically install the excel extension when missing.

Consult the docstrings in :mod:duckplus.io for the full argument lists. When an extension is required, DuckPlus will attempt to install it automatically or raise an actionable message if the environment is offline. Call manager.apply_helper("read_csv", ...) to route through the bound helper directly, or pass overwrite=True to :meth:DuckCon.register_helper <duckplus.duckcon.DuckCon.register_helper> if you need to replace the defaults with a custom implementation.

Composing with custom helpers#

:class:DuckCon <duckplus.duckcon.DuckCon> exposes :meth:~duckplus.duckcon.DuckCon.register_helper and :meth:~duckplus.duckcon.DuckCon.apply_helper so you can wrap bespoke data sources. Register a callable that accepts the open connection, then return a DuckPlus relation to remain within the immutable flow:

def read_yaml(connection, path):
    return connection.sql("SELECT * FROM read_json(? ::VARCHAR)", [str(path)])

manager.register_helper("read_yaml", read_yaml)

with manager:
    relation = manager.apply_helper("read_yaml", Path("data.yaml"))

The returned relation captures metadata like any built-in reader, so downstream validation and schema utilities continue to work.