Adding sources, destinations and pipelines to your project
Adding a new entity to an existing dlt+ project is easy. You can add a new entity to your project by running the dlt <entity_type> <entity_name> add
command. Depending on the entity you are adding different options are available. To see all options for adding a destination for example, you can run dlt destination add --help
. Let's individually add a source, destination and pipeline to a new project, replicating the default project we created in the previous chapter.
Add an empty project
Delete all the files in the tutorial
folder and run the following command to create an empty project:
dlt project init
This will create a project without any sources, destinations, datasets or pipelines, the project will be named after the folder.
Add all entities
Now we can add all of our entities individually. This way we can also give them their own names which will be useful when having multiple destinations of the same type for example.
Add a source:
# add a new arrow source called "my_arrow_source"
dlt source my_arrow_source add arrow
Add a destination:
# add a new duckdb destination called "my_duckdb_destination"
# this will also create a new dataset called "my_duckdb_destination_dataset"
dlt destination my_duckdb_destination add duckdb
If you want to create a dataset automatically you can use the --dataset-name
flag:
# add a new duckdb destination called "my_duckdb_destination"
dlt destination my_duckdb_destination add duckdb --dataset-name my_duckdb_destination_dataset
Now we can add a pipeline that uses the source and destination we just added:
# add a new pipeline called "my_pipeline" which loads from my_arrow_source and saves to my_duckdb_destination
# we select the my_duckdb_destination_dataset with the optional flag
dlt pipeline my_pipeline add my_arrow_source my_duckdb_destination --dataset_name my_duckdb_destination_dataset
Run the pipeline
As in the first chapter, we can now run the pipeline:
dlt pipeline my_pipeline run
And inspect the dataset
dlt dataset my_duckdb_destination_dataset row-counts