Skip to content

Feature: Show external table uri when materilized in s3 buckets with dbt-duckdb  #532

@01100100

Description

@01100100

Describe the feature

I would like dbt-docs to display the S3 URI for externally materialized tables in the "Relation" field, similar to how relations are shown for other adapters.

For example, given a model models/user.sql with the following profile and model configuration, the data will be written to https://fly.storage.tigris.dev/bucket-xxx/modelled/user.json. I would like this URI to be visible in the docs, ideally within the "relation" section, for quick reference.

Example Configuration

factory:
  target: dev
  outputs:
    dev:
      threads: 4
      type: duckdb
      extensions: ['httpfs']
      path: dbt.duckdb
      secrets:
        - type: s3
          region: "{{ env_var('AWS_REGION') }}"
          key_id: "{{ env_var('AWS_ACCESS_KEY_ID') }}"
          secret: "{{ env_var('AWS_SECRET_ACCESS_KEY') }}" 
          endpoint: "{{ env_var('AWS_ENDPOINT_URL_S3') | replace('https://', '') }}"
      external_root: s3://bucket-xxx/modelled
      default:
export AWS_ENDPOINT_URL_S3=fly.storage.tigris.dev
models:
  factory:
    +materialized: external
    user:
      +format: json

In this case, the model models/user.sql will write the external table to https://fly.storage.tigris.dev/bucket-xxx/modelled/user.json. I would like this path to be included in the docs.

Additional context

Is this feature database-specific? Which database(s) is/are relevant? Please include any other relevant context here.

This feature is specific to the dbt-duckdb adapter and applies when writing to external files.

The external location path is set in this macro:

If the location argument is specified, it must be a filename (or S3 bucket/path), and dbt-duckdb will attempt to infer the format argument from the file extension of the location if the format argument is unspecified (this functionality was added in version 1.4.1.)

If the location argument is not specified, then the external file will be named after the model.sql (or model.py) file that defined it with an extension that matches the format argument (parquet, csv, or json). By default, the external files are created relative to the current working directory, but you can change the default directory (or S3 bucket/prefix) by specifying the external_root setting in your DuckDB profile.

Who will this benefit?

This feature will be valuable for:

  • Developers who need to quickly query external data without manually looking up the S3 URI.
    Example: Users can easily use the URI with an in-memory DuckDB instance.
  • App builders who want to integrate external table locations into their applications.
    Example: Developers building web applications with plots or data visualizations can access the external table URI directly.

Additionally, this could pave the way for a more interactive exploration of model data directly within the dbt docs by linking to the external data location. 🤔 CLOUD NATIVE DATA FORMATS + WASM INMEMORY DATABASE ⚡

Are you interested in contributing this feature?

Yes 🧔‍♂️

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions