Skip to main content

Configuration and profiles

Profiles

Profiles are a way to manage different configurations for different environments. They are defined in the dlt.yml under the profiles section. Every project by default has two implicit profiles: dev and tests. The default profile that is loaded if no profile is specified is dev. All cli commands that execute on a project have a --profile option to specify which profile to use. The profile configuration is deep merged with the project configuration and profiles may inherit from each other. Let's have a look at our example project. The profiles section currently looks like this:

profiles:
dev: {}

Inspecting the current state of the project configuration of a profile

Which means the dev profile is empty and by default all the settings are inherited from the project configuration. We can inspect the current state of the project configuration by running

dlt project --profile dev config show

This will show the current state of the project configuration with the dev profile loaded. If you omit the --profile option, the dev profile is use

Adding a new profile

We can now create a new profile called prod that changes the location of the duckdb file we are loading to as well as the log level of the project and the amount of rows we are loading. Please run:

dlt profile prod add

And change the prod profile to the following:

  prod:
sources:
my_arrow_source:
row_count: 200
runtime:
log_level: INFO
destinations:
my_duckdb_destination:
credentials: ${tmp_dir}my_data_prod.duckdb

We can now inspect the prod profile. You will see that the new settings are merged with the project configuration and the dev profile settings.

dlt project --profile prod config show

Run a pipeline with the new profile and inspect the results

Now let's run the pipeline with the prod profile.

dlt pipeline --profile prod my_pipeline run

You can now see more output in the console due to the more verbose log level and the amount of rows loaded is now 200 instead of 100. Let's inspect our datasets for each profile (assuming you still have the duckdb database file from the previous chapter).

dlt dataset --profile dev my_duckdb_destination_dataset row-counts
dlt dataset --profile prod my_duckdb_destination_dataset row-counts

You will see that the amount of rows loaded is now 200 instead of 100 on the prod profile.

Inheriting from other profiles

note

This works already but for now is out of scope for this tutorial.

Using config files with profiles

You can also use the same configuration and secrets toml files and environment variables as described in dlt OSS documentation. You have probably noticed that your project contains more than one secrets file with the profile name prepended. These secrets files are only loaded if a given profile is active. Let's move the duckdb credentials, runtime settings and source settings to the toml files instead of the dlt.yml file to demonstrate this:

First remove all the content of the prod section in the dlt.yml file, but keep the key and the empty secrets file. We can also remove the runtime section from the dlt.yml file as well as the credentials key from the destination and the row_count key from the sources.my_arrow_source section. If you try to run the pipeline now, dlt will complain about missing configuration values:

dlt pipeline my_pipeline run

Now let's add the following to the dev.secrets.toml file:

[runtime]
log_level = "WARNING"

[destination.my_duckdb_destination]
credentials = "_storage/my_data.duckdb"

[sources.my_arrow_source]
row_count = 100

And the following to the prod.secrets.toml file:

[runtime]
log_level = "INFO"

[destination.my_duckdb_destination]
credentials = "_storage/my_data_prod.duckdb"

[sources.my_arrow_source]
row_count = 200

We can now clear the _storage directory and repeat the steps above where you run both pipelines and inspect both datasets, you will see that the settings from the toml files are applied:

Load some data:

dlt pipeline --profile dev my_pipeline run
dlt pipeline --profile prod my_pipeline run

Inspect the datasets:

dlt dataset --profile dev my_duckdb_destination_dataset row-counts
dlt dataset --profile prod my_duckdb_destination_dataset row-counts
note

Please note the following inconsistencies between the yaml and toml files that will be fixed in the future:

  • The yaml destinations section is singularized to destination in the toml file.
  • The project variables such as tmp_dir are not available in the toml files.

Configuration precedence

Based on the information about precedence in the configuration docs, the yaml files provide the lowest precedence of all providers just above the default values for a config value. Settings in the yaml file will therefore be overriden by toml and env variables if present.

Settings in the dlt.yml file vs toml files

The generally recommended approach for dlt+ yaml projects is, to keep all non secrets settings in the dlt.yml file and only move secrets to the secrets.toml files. Ideally you will not make any secrets available to profiles/environments where the are not required. In the example above we are moving non-secrets to the secrets.toml files, this is for demonstration purposes only.

This demo works on codespaces. Codespaces is a development environment available for free to anyone with a Github account. You'll be asked to fork the demo repository and from there the README guides you with further steps.
The demo uses the Continue VSCode extension.

Off to codespaces!

DHelp

Ask a question

Welcome to "Codex Central", your next-gen help center, driven by OpenAI's GPT-4 model. It's more than just a forum or a FAQ hub – it's a dynamic knowledge base where coders can find AI-assisted solutions to their pressing problems. With GPT-4's powerful comprehension and predictive abilities, Codex Central provides instantaneous issue resolution, insightful debugging, and personalized guidance. Get your code running smoothly with the unparalleled support at Codex Central - coding help reimagined with AI prowess.