Initializing a new project

Before starting, ensure you are familiar with the core concepts of dlt, as this will provide essential context for working with dlt projects.

Overview

The dlt project init command is a powerful way to create a structured starting point for a dlt project. A dlt project revolves around a YAML specification, defining the entities and configurations needed to build pipelines, sources, and destinations as described in the core concepts.

Creating a New Project

Generating your first project

Start by creating a new folder for your project. Then, navigate to the folder in your terminal.

mkdir tutorial && cd tutorial

Run the following command to initialize a new dlt project:

# Initialize a dlt project named "tutorial", the name is derived from the folder name
dlt project init arrow duckdb

This command generates a project named tutorial with:

One pipeline
One Arrow source defined in sources/arrow.py
One DuckDB destination
One dataset on the DuckDB destination

The Generated Folder Structure

After running the command, the following folder structure is created:

.
├── .dlt/                 # your dlt settings including profile settings
│   ├── config.toml
│   ├── dev.secrets.toml
│   └── secrets.toml
├── _storage/             # local storage for your project, excluded from git
├── destinations/         # your destinations, empty in this example
├── sources/              # your sources, contains the code for the arrow source
│   └── arrow.py
├── .gitignore
└── dlt.yml       # the main project manifest

Understanding `dlt.yml`

The dlt.yml file is the central configuration for your dlt project. It defines the pipelines, sources, and destinations. In the generated project, the file looks like this:

# project settings
project:
  name: template

profiles:
  # profiles allow you to configure different settings for different environments
  dev: {}

# your sources are the data sources you want to load from
sources:
  arrow:
    type: sources.arrow.source

# your destinations are the databases where your data will be saved
destinations:
  duckdb:
    type: duckdb

# your datasets are the datasets on your destinations where your data will go
datasets: {}

# your pipelines orchestrate data loading actions
pipelines:
  my_pipeline:
    source: arrow
    destination: duckdb
    dataset_name: my_pipeline_dataset

tip

If you do not want to start with source, destination and pipeline, you can simply run dlt project init tutorial. This will generate a project with empty sources, destinations, and pipelines.

Available project vars that are substituted during loading:

project_dir - the root directory of the project, i.E. the directory where the dlt.yml file is located
tmp_dir - the directory for storing temporary files, can be configured project section as seen above, by default will be set to ${project_dir}_storage.
name - the name of the project, can be configured project section as seen above
default_profile - the name of the default profile, can be configured project section as seen above
current_profile - the name of the current profile, this is set automatically when a profile is used

Some details about the project structure above:

The project section could be fully omitted in this case, it will be generated to be more explicit what the default settings are.
The runtime section is analog to the config.toml [runtime] section, and could also be omitted in this case.
The profiles section is not doing much in this case. There are two implicit profiles: dev and tests that are present in any project, we will learn about profiles in more detail later.

Understanding the basics of the project context

The dlt.yml marks the root of a project. Projects can also be nested. If you run any dlt project cli command, dlt will search for the project root in the filesystem tree starting from the current working directory and run all operations on the found project. So if your dlt.yml is in the tutorial folder, you can run dlt pipeline my_pipeline run from this folder or all subfolders of this folderand it will run the pipeline on the tutorial project.

Running the Pipeline

Once the project is initialized, you can run the pipeline using:

dlt pipeline my_pipeline run

This command:

Locates the pipeline named my_pipeline in dlt.yml
Executes it, populating the duckdb destination that is defined to be stored in ${tmp_dir}my_data.duckdb.

Inspecting the Results

Use the dlt dataset command to interact with the dataset stored in the DuckDB destination. For example:

Counting the loaded rows:

To count rows in the dataset, run:

dlt dataset duckdb_dataset row-counts

This will show the amount of rows in the items table as specified by the arrow source. Additionally the internal dlt tables are shown.

            table_name  row_count
              items        100
       _dlt_version          1
         _dlt_loads          2
_dlt_pipeline_state          1

View Data

To view the first five rows of the items table:

dlt dataset duckdb_dataset head items

This displays the top entries in the items table, enabling quick validation of the pipeline's output. The output will be something like this:

Loading first 5 rows of table items.
    id   name  age       _dlt_load_id         _dlt_id
100    jim   56  1737465323.617184  /qaxfQ/rbD/KcQ
100  alice   39  1737465323.617184  H996PcWDbMuDbQ
100  jerry   64  1737465323.617184  R27cQDLTQQ+dxg
100  jenny   50  1737465323.617184  9eKG60Ok0fbTpA
100  jerry   51  1737465323.617184  Wj9m7VGQzzLi3w

To show more rows use the --limit flag.

dlt dataset duckdb_dataset head items --limit 50

Learn more

Next chapter: Adding entities

Initializing a new project

Overview

Creating a New Project

Generating your first project

The Generated Folder Structure

Understanding `dlt.yml`

Understanding the basics of the project context

Running the Pipeline

Inspecting the Results

Counting the loaded rows:

View Data

Learn more

DHelp

Ask a question

Initializing a new project

Overview​

Creating a New Project​

Generating your first project​

The Generated Folder Structure​

Understanding dlt.yml​

Understanding the basics of the project context​

Running the Pipeline​

Inspecting the Results​

Counting the loaded rows:​

View Data​

Learn more​

DHelp

Ask a question

Overview

Creating a New Project

Generating your first project

The Generated Folder Structure

Understanding `dlt.yml`

Understanding the basics of the project context

Running the Pipeline

Inspecting the Results

Counting the loaded rows:

View Data

Learn more