Iceberg
The iceberg
destination provides additional features on top of the filesystem
destination in OSS dlt
. This page only documents the additional features—use the documentation provided in OSS dlt for standard functionality.
delete-insert
merge strategy with iceberg
table format
The delete-insert
merge strategy can be used when using the iceberg
table format:
@dlt.resource(
primary_key="id", # merge_key also works; primary_key and merge_key may be used together
write_disposition={"disposition": "merge", "strategy": "delete-insert"},
)
def my_resource():
yield [
{"id": 1, "foo": "foo"},
{"id": 2, "foo": "bar"}
]
...
pipeline = dlt.pipeline("loads_iceberg", destination="iceberg")
Table format
iceberg
destination automatically assigns iceberg
table format to all resources that it will load. You can still
fall back to storing files (as specified in file_format
) by setting table_format
to native on a resource.
Configuration
Iceberg destinations looks for its configuration under destination.iceberg. Otherwise it is configured
in the same way as filesystem
destination.
[destination.iceberg]
bucket_url = "s3://[your_bucket_name]" # replace with your bucket name,
[destination.iceberg.credentials]
aws_access_key_id = "please set me up!" # copy the access key here
aws_secret_access_key = "please set me up!" # copy the secret access key here
You are still able to use regular filesystem configuration.
from dlt_plus.destinations import iceberg
dest_ = iceberg(destination_name="filesystem")
Known limitations
- Compound keys are not supported: use a single
primary_key
and/or a singlemerge_key
.- As a workaround, you can transform your resource data with
add_map
to add a new column that contains a hash of the key columns, and use that column asprimary_key
ormerge_key
.
- As a workaround, you can transform your resource data with
- Nested tables are not supported: avoid complex data types or disable nesting