Configuration and profiles
Profiles
Profiles are a way to manage different configurations for different environments. They are defined in the dlt.yml
under the profiles
section. Every project by default has two implicit profiles: dev
and tests
. The default profile that is loaded if no profile is specified is dev
. All cli commands that execute on a project have a --profile
option to specify which profile to use. The profile configuration is deep merged with the project configuration and profiles may inherit from each other. Let's have a look at our example project. The profiles section currently looks like this:
profiles:
dev: {}
Inspecting the current state of the project configuration of a profile
Which means the dev
profile is empty and by default all the settings are inherited from the project configuration. We can inspect the current state of the project configuration by running
dlt project --profile dev config show
This will show the current state of the project configuration with the dev
profile loaded. If you omit the --profile
option, the dev
profile is use
Adding a new profile
We can now create a new profile called prod
that changes the location of the duckdb file we are loading to as well as the log level of the project and the amount of rows we are loading. Please run:
dlt profile prod add
And change the prod profile to the following:
prod:
sources:
my_arrow_source:
row_count: 200
runtime:
log_level: INFO
destinations:
my_duckdb_destination:
credentials: ${tmp_dir}my_data_prod.duckdb
We can now inspect the prod profile. You will see that the new settings are merged with the project configuration and the dev
profile settings.
dlt project --profile prod config show
Run a pipeline with the new profile and inspect the results
Now let's run the pipeline with the prod
profile.
dlt pipeline --profile prod my_pipeline run
You can now see more output in the console due to the more verbose log level and the amount of rows loaded is now 200 instead of 100. Let's inspect our datasets for each profile (assuming you still have the duckdb database file from the previous chapter).
dlt dataset --profile dev my_duckdb_destination_dataset row-counts
dlt dataset --profile prod my_duckdb_destination_dataset row-counts
You will see that the amount of rows loaded is now 200 instead of 100 on the prod profile.
Inheriting from other profiles
This works already but for now is out of scope for this tutorial.
Using config files with profiles
You can also use the same configuration and secrets toml files and environment variables as described in dlt OSS documentation. You have probably noticed that your project contains more than one secrets file with the profile name prepended. These secrets files are only loaded if a given profile is active. Let's move the duckdb credentials, runtime settings and source settings to the toml files instead of the dlt.yml
file to demonstrate this:
First remove all the content of the prod
section in the dlt.yml
file, but keep the key and the empty secrets file. We can also remove the runtime
section from the dlt.yml
file as well as the credentials
key from the destination and the row_count
key from the sources.my_arrow_source
section. If you try to run the pipeline now, dlt will complain about missing configuration values:
dlt pipeline my_pipeline run
Now let's add the following to the dev.secrets.toml
file:
[runtime]
log_level = "WARNING"
[destination.my_duckdb_destination]
credentials = "_storage/my_data.duckdb"
[sources.my_arrow_source]
row_count = 100
And the following to the prod.secrets.toml
file:
[runtime]
log_level = "INFO"
[destination.my_duckdb_destination]
credentials = "_storage/my_data_prod.duckdb"
[sources.my_arrow_source]
row_count = 200
We can now clear the _storage directory and repeat the steps above where you run both pipelines and inspect both datasets, you will see that the settings from the toml files are applied:
Load some data:
dlt pipeline --profile dev my_pipeline run
dlt pipeline --profile prod my_pipeline run
Inspect the datasets:
dlt dataset --profile dev my_duckdb_destination_dataset row-counts
dlt dataset --profile prod my_duckdb_destination_dataset row-counts
Please note the following inconsistencies between the yaml and toml files that will be fixed in the future:
- The yaml
destinations
section is singularized todestination
in the toml file. - The project variables such as
tmp_dir
are not available in the toml files.
Configuration precedence
Based on the information about precedence in the configuration docs, the yaml files provide the lowest precedence of all providers just above the default values for a config value. Settings in the yaml file will therefore be overriden by toml
and env
variables if present.
Settings in the dlt.yml
file vs toml files
The generally recommended approach for dlt+ yaml projects is, to keep all non secrets settings in the dlt.yml
file and only move secrets to the secrets.toml files. Ideally you will not make any secrets available to profiles/environments where the are not required. In the example above we are moving non-secrets to the secrets.toml files, this is for demonstration purposes only.