dbt Notes

Posted Feb 2, 2025 Updated Feb 5, 2025

By Marylette Roa

24 min read

dbt Notes

I’ve recently created a learning log for this year (2025). The first topic that I wanted to go over is dbt (data-build-tool). Here are some of my notes. I’m putting more materials or notes here about dbt as I go along.

To start, I’ve taken a Udemy course The Complete dbt (Data Build Tool) Bootcamp: Zero to Hero as an introduction to dbt. Here is the repo where I’ve been doing the hands-on throughout the course: dbtlearn. I’ve started the course on Jan 24, and finished on Feb 2. (Sharing my certificate). I’m hoping these notes would be a good reference in the future.

dbt and Snowflake setup

Here’s what I did for the setup:

Created a Snowflake trial account. It’s good for 30 days with $400 credit.
Setup Snowflake warehouse and permissions for dbt, ingested Airbnb data hosted in AWS S3 into the raw tables.
In my local machine, installed Python 3.12.7 (I needed to downgrade from Python 3.13.0 as it’s not yet supported by dbt). I also installed virtualenv using Homebrew but I realized I could have just installed it using pip.
Installed dbt-snowflake in the virtual environment (always use a Python virtual environment!)
Created a dbt user directory mkdir ~/.dbt
Initialized dbt project (dbt init dbtlearn) and connected it to the Snowflake account. I also learned just to delete the project directory when deleting a project (I made a configuration mistake! :p)
Updated dbt_project.yml to remove examples; deleted examples from the project directory
Installed dbt plugin “Power User for dbt core” in VS Code and set it up.

Models and materialization

Here’s the final outcome in the data warehouse

The project YAML file has been modified to follow these defaults; the + sign before +materialized means that it’s a dbt keyword and not a directory

  
models:
  dbtlearn:
    +materialized: view
    dim:
      +materialized: table
    src:
      +materialized: ephemeral

The following configuration was added for the the two dim tables so they materialized as views instead of tables, overriding the default state in the project YAML.

  
{​{
    config(
        materialized = 'view'
    )
}​}

Using the dbt plugin was also convenient in that when testing the SQL, I only needed to run it within the editor and it’s able to return the results of the query (no need to switch into Snowflake’s interface). It can also be used to build the model using GUI.

dbt models are SQL definitions that can be materialized (or not) into tables or views. These are stored as SQL files under dbtlearn/models/. They consist of select statements but contain other features such as: semantic dependencies with other models, scripts and macros.

dbtprpoject/target/run/dbtlearn/models - contains the final compiled models. Files here can be investigated when troubleshooting.

CTE (common table expressions) are widely used and recommended as they are readable, easy to maintain, and make complex queries readable. They are similar to views but are not stored

Commands to build models include:

dbt run - materializes the models
dbt run --full-refresh - rebuilds all tables including incremental tables

There are four (4) types of materialization in dbt

type	use	don’t use
view	light weight representation; the model isn’t being reused often	the model is being read several times
table	the model is being read repeatedly	single-use models; incremental load
incremental	fact, event tables; appends to tables	update historical records
ephemeral	i.e. no materialization; want to use an alias to a column	model is read several times

Here’s an example of a dbt model using an incremental load, and Jinja

  
{​{
    config(
        materialized = 'incremental',
        on_schema_change = 'fail'
    )
}​}
with src_reviews as (
    select * from {​{ ref('src_reviews') }​}
)

select * from src_reviews
where review_text is not null

{​% if is_incremental() %​}
    and review_date > (select max(review_date) from {​{this}​})
{​% endif %​}

When switching from a materialized model to an ephemeral model, delete the tables manually in the data warehouse as dbt won’t delete the tables or views itself.

Seeds

local files yet to be uploaded to the warehouse using dbt
contained in this path dbtlearn/seeds e.g. dbtlearn/seeds/seed_full_moon_dates.csv
in the project YAML file, the paths of seeds are defined as

  
seed-paths: ["seeds"]

dbt seed command populates seed in Snowflake; this results in a table. (Snowflake can infer the column schema from file)

Source

an abstraction layer that sits on the top of the input (raw) tables e.g. those defined in models /dbtlearn/models/src/src_*.sql
contains extra features such as checking for data freshness
to define models as source, create a separate source.yml file under the src model directory dbtlearn/models/sources.yml

e.g.

  
version: 2

sources:
    - name: airbnb
      schema: raw
      tables:
        - name: listings
          identifier: raw_listings

        - name: hosts
          identifier: raw_hosts

        - name: reviews
          identifier: raw_reviews
          loaded_at_field: date
          freshness:
            warn_after: {count: 1, period: hour}
            error_after: {count: 24, period: hour}

in models/src/src_listings.sql instead of directly referencing the Snowflake table as AIRBNB.RAW.RAW_LISTINGS, it can now be abstracted to
- {{ source('airbnb', 'listings') }} using names listed in the source YAML file

e.g.

  
with raw_listings as (
select * from {​{ source('airbnb', 'listings') }​}
)

-- continue code here

dbt compile checks template tags, and connections; dbt goes through models, YAML files, tests etc and checks if all references are correct

Source freshness

in the source YAML file, these lines under reviews define freshness

  
        loaded_at_field: date
        freshness:
          warn_after: {count: 1, period: hour}
          error_after: {count: 24, period: hour}

this means
- give warning if there’s no data that’s less than an hour fresh
- return an error if there are no new data in 24 hours
dbt source freshness is the command to use to check the freshness

Snapshots

Snapshots in dbt implement Type 2 SCD
Path is as defined in the project YAML file, and consist of SQL files
1 snapshot-paths: ["snapshots]
example code dbtlearn/snapshots/scd_raw_listings.sql

  
{​% snapshot scd_raw_listings %​}

{​{
    config(
        target_schema = 'dev',
        unique_key='id',
        strategy='timestamp',
        updated_at='updated_at',
        invalidate_hard_deletes = True
    )
}​}

select * from {​{ source('airbnb', 'listings') }​}

{​% endsnapshot %​}

config line defines how and where to create the snapshot; in this example it is created as a new table in Snowflake AIRBNB.DEV.SCD_RAW_LISTINGS
invalidate_hard_deletes = True means that if the record is deleted, dbt will replace the dbt_valid_to date to the snapshot date; if equals to False, it will not capture the delete
the table is materialized with additional columns
dbt snapshot takes the snapshots

as records are updated in the raw table, the SCD table captures the changes as new rows with the dbt_valid_to fields updated

Tests

There are two kinds of tests in the dbt
- singular
  - SQL queries stored in the tests folder
  - the expected result set is empty
- generic
  - built-in tests from the core installation of dbt
    - unique
    - not_null
    - accepted_values
    - relationships - takes a column value and makes sure it is a valid reference to records in another table
  - custom tests and tests imported from dbt packages
unique and not_null are built in warehouse databases but may not be present in newer types like data lake or lakehouses
dbt test runs the tests

Singular tests

SQL in tests folder e.g. dbtlearn/tests/dim_listings_minimum_nights.sql
example

  
select
    *
from
    {​{ ref('dim_listings_cleansed') }​}
where minimum_nights < 1 
limit 10

dbt test --help brings up the docs
dbt test --select dim_listings_cleansed - runs the tests associated with the dim_listings_cleansed model only

Generic tests

define schema.yml under models folder
- name does not matter; can be one file or several files
- can be used for extra configurations, tests, documentation

  
version: 2

models:
  - name: dim_listings_cleansed
    columns:

      - name: listing_id
        tests:
          - unique
          - not_null

      - name: host_id
        tests:
          - not_null
          - relationships:
              to: ref('dim_hosts_cleaned')
              field: host_id

      - name: room_type
        tests:
          - accepted_values:
              values: ['Entire home/apt',
                        'Private room',
                        'Shared room',
                        'Hotel room']

Custom generic tests

macros with special signature

example dbtlearn/macros/positive_value.sql

  
{​% test positive_value(model, column_name) %​}
select
  *
from
  {​{ model }​}
where
  {​{ column_name }​} < 1
{​% endtest %​}

add these in schema.yml under model

  
version: 2
models:
  - name: dim_listings_cleansed
    columns:

      - <other tests here>
	
      - name: minimum_nights
        tests:
          - positive_value

Macros

Jinja templates created in the macros folder
can be used in model definitions and tests
- test is a special macro for implementing generic tests
example: dbtlearn/macros/no_nulls_in_columns.sql
- - in the for loop means remove whitespaces

  
{​% macro no_nulls_in_columns(model) %​}
    select * from {​{ model }​} where
    {​% for col in adapter.get_columns_in_relation(model) -%​}
        {​{ col.column}​} is null or
    {​% endfor %​}
    FALSE
{​% endmacro %​}

use in dbtlearn/tests/no_nulls_in_dim_listings.sql

  
{​{ no_nulls_in_columns(ref('dim_listings_cleansed')) }​}

Third-party packages

extend dbt using dbt packages; allows access to a plethora of macros and tests
go to hub.getdbt.com
example dbt-utils > generate_surrogate_key
- “cross-database way” if the field values are the same in two or more instances of the database (even if different types of databases), the hash will still be the same

to install:

create a file in the dbt project root directory called dbtlearn/packages.yml

  
packages:
  - package: dbt-labs/dbt_utils
    version: 1.3.0

run dbt deps to install
- might not work behind firewall
- work around: go to the GitHub repository of the package, copy it manually to the dbt packages folder dbtlearn/dbt_packages
example of usage: dbtlearn/models/fct/fct_reviews.sql

  
{​{
    config(
        materialized = 'incremental',
        on_schema_change = 'fail'
    )
}​}
with src_reviews as (
    select * from {​{ ref('src_reviews') }​}
)

select 
{​{ dbt_utils.generate_surrogate_key(['listing_id', 'review_date', 'reviewer_name', 'review_text'])}​} as review_id,
* from src_reviews
where review_text is not null

{​% if is_incremental() %​}
    and review_date > (select max(review_date) from {​{this}​})
{​% endif %​}

run dbt run --select fct_reviews --full-refresh - specify full refresh so that the run doesn’t error out on_schema_change
review_id has been added to the table in Snowflake

Documentation

Should be kept close to the actual code as possible
Defined in two ways
- in YAML files (e.g. schema.yml)
- in stand alone markdown files
dbt combines these documentations in an html service that can be used on your own server
otherwise, dbt ships with a lightweight documentation web server
overview.md is a special file that can be used to customize landing pages
dbt can also be configured to store assets like downloadables and images in a special folder that can be referenced in a markdown file

Basic documentation

update schema.yml to add descriptions

  
version: 2

models:
  - name: dim_listings_cleansed
    description: Cleansed table which contains Airbnb listings.
    columns:

      - name: listing_id
        description: Primary key for the listing
        tests:
          - unique
          - not_null

run dbt docs generate to generate docs at target folder
- catalog will be written to dbtlearn/target/catalog.json
- html in dbtlearn/target/index.html
run dbt docs serve to render the documentation in the built-in dbt webserver

Markdown-based docs, custom overview page, and assets

create a markdown file e.g. in the models directory dbtlearn/models/docs.md

{​% docs dim_listing_cleansed__minimum_nights %​}
Minimum number of nights required to rent this property.

Keep in mind that old listings might have `minimum_nights` set to 0 in the source tables. Our cleansing algorithm updates this to `1`.

{​% enddocs %​}

add to description in schema.yml

       - name: minimum_nights
        description: '{​{ doc("dim_listing_cleansed__minimum_nights") }​}'
        tests:
          - positive_value

can also create an overview doc in dbtlearn/models/overview.md, e.g.

  
{​% docs __overview__ %​}
# Airbnb pipeline

Hey, welcome to our Airbnb pipeline documentation!

Here is the schema of our input data:
![input schema](https://dbtlearn.s3.us-east-2.amazonaws.com/input_schema.png)

{​% enddocs %​}

to reference assets locally, add a folder in the project root assets
- make sur to define this path in project.yml

  
asset-paths: ["assets"]

replace the url path in the markdown file e.g.

  
{​% docs __overview__ %​}
# Airbnb pipeline

Hey, welcome to our Airbnb pipeline documentation!

Here is the schema of our input data:
![input schema](assets/input_schema.png)

{​% enddocs %​}

running dbt docs generate will generate the assets into target folder e.g. target/assets/input_schema.png

Lineage graph (data flow DAG)

the lineage graph is accessible in the docs UI

can set filter in the UI e.g.
- --select +src_hosts+
  - + preceeding src_hosts - show dependencies
  - + after src_hosts - everything that builds on src_hosts
- --exclude dim_listings_w_hosts+
  - hide dim_listings_w_hosts and everything that depends on it
can also use these filters in other dbt commands dbt run --help e.g.
- e.g. dbt run --select src_hosts+
  - src_hosts and every model that builds from src_hosts will execute

Analyses

can run a query using dbt syntax and references but not necessarily create a model
version control, and execute against the data warehouse
e.g. dbtlearn/analyses/full_moon_no_sleep.sql

  
with mart_fullmoon_reviews as (
    select * from {​{ ref('mart_fullmoon_reviews') }​}
)

select
    is_full_moon,
    review_sentiment,
    count(*) as reviews
from
    mart_fullmoon_reviews
group by
    is_full_moon,
    review_sentiment
order by
    is_full_moon,
    review_sentiment

run dbt compile
parsed SQL will be written to target e.g. dbtlearn/target/compiled/dbtlearn/analyses/full_moon_no_sleep.sql which can be run in the data warehouse

Hooks

SQLs that are executed at predefined times
can be defined at project, models directory, or individual models level
there are four types of hooks
- on_run_start: executed at the start of dbt {run, seed, snapshot}
- on_run_end: executed at the end of dbt {run, seed, snapshot}
- pre-hook: executed before a model/seed/snapshot is built
- post-hook: executed after a model/seed/snapshot is built

e.g. add a REPORTER role in Snowflake, initially this role does not have access to dbt views and tables

in project yaml, add hook under models
- this will run for all models {{ this }}

  
    +post-hook:
      - "GRANT SELECT ON {​{ this }​} TO ROLE REPORTER"

run dbt run to execute the models
then go to Snowflake to check that the reporter role has select permission on all tables and views from dbt

Exposure

configurations that can point to external resources like reports and dashboards
can be integrated and compiled in documentations
e.g. create a yaml file dbtlearn/models/dashboards.yml

  
version: 2

exposures:
  - name: executive_dashboard
    label: Executive Dashboard
    type: dashboard
    maturity: low
    url: <url to dashboard here>
    description: Executive Dashboard about Airbnb listings and hosts
      

    depends_on:
      - ref('dim_listings_w_hosts')
      - ref('mart_fullmoon_reviews')

    owner:
      name: M
      email: email@example.com

run dbt docs generate then dbt docs serve
the exposures also appear in the lineage graph

`dbt-expectations`

Great Expectations

data testing framework
provides a bunch of functions that can be applied to a data

dbt-expectations

not a port of Great Expectations in dbt but inspired by it
see Available tests
to install, add to project.yml, and run dbt deps

  
packages:
  - package: calogica/dbt_expectations
    version: 0.10.1

examples

comparing row counts between models
- add to schema.yml
- run dbt test --select dim_listings_w_hosts to only run test for this model

  
  - name: dim_listings_w_hosts
    tests:
      - dbt_expectations.expect_table_row_count_to_equal_other_table:
          compare_model: source('airbnb', 'listings')

looking for outliers in your data

  
    columns:
      - name: price
        tests:
        - dbt_expectations.expect_column_quantile_values_to_be_between:
            quantile: .99
            min_value: 50
            max_value: 500

implementing test warnings
- note that config can be set on any tests not just here

  
        - dbt_expectations.expect_column_max_to_be_between:
            max_value: 5000
            config:
              severity: warn

validating column types
- uses the types of backend (in this case Snowflake)

  
        - dbt_expectations.expect_column_values_to_be_of_type:
            column_type: number

monitoring categorical variables in the source data
- add to sources.yml file
- to run a test for a source, run dbt test --select source:airbnb.listings
  - also runs tests that affect the source table e.g.
    - dbt_expectations_source_expect_column_distinct_count_to_equal_airbnb_listings_room_type__4
    - dbt_expectations_expect_table_row_count_to_equal_other_table_dim_listings_w_hosts_source_airbnb_listings_

  
version: 2

sources:
    - name: airbnb
      schema: raw
      tables:
        - name: listings
          identifier: raw_listings
          columns:
            - name: room_type
              tests:
                - dbt_expectations.expect_column_distinct_count_to_equal:
                    value: 4

Debugging

Debugging dbt tests

dbt --debug test --select source:airbnb.listings
- for debugging, will see all the SQLs that dbt runs against the data warehouse in the terminal screen which can get overwhelming
- or check the compiled SQL in the target folder target/compiled/dbtlearn/models/sources.yml/dbt_expectations_source_expect_a60b59a84fbc4577a11df360c50013bb.sql
  - this can be executed in Snowflake for further debugging

Debugging and working with regular expressions

e.g. regex for price $90.00
- $ and . need to be escaped for regex, then escaped for YAML, then need to be escaped twice so the slashes actually render in Snowflake; these happens when there are many interconnected technologies
- use the compiled SQL command in targets directory to debug the regex in Snowflake

  
sources:
    - name: airbnb
      schema: raw
      tables:
        - name: listings
          identifier: raw_listings
          columns:
            - name: price
              tests: 
                - dbt_expectations.expect_column_values_to_match_regex:
                    regex: "^\\\\$[0-9][0-9\\\\.]+$"

Logging

can log messages to dbt’s standard log file, or the terminal, or disable a log message temporarily

Logging to the dbt log file

create a macro dbtlearn/macros/logging.sql
run the macro dbt run-operation learn_logging
this will be logged in the logging file dbtlearn/logs/dbt.log as [debug]
can add this Jinja command anywhere e.g. in the middle of a SQL command

  
{​% macro learn_logging() %​}
    {​{ log("This is a logging message from the macro") }​}
{​% endmacro %​}​

Logging to screen

add a second paramater info=True
message will appear on the screen, and also in the dbt log file as [info]

[0m21:08:00.947064 [info ] [MainThread]: This is a logging message from the macro

Disabling log messages

there are two layers of execution in dbt
- when macro is executed, the jinja will be executed
- then the SQL from Jinja will be executed
so to comment out a log, use a Jinja comment tag # instead of sql comment tag --
- ❌ -- {{ log("This is a logging message from the macro", info=True) }}
- ✅ {# log("This is a logging message from the macro", info=True) #}

  
{​% macro learn_logging() %​}
   {​# log("This is a logging message from the macro", info=True) #​}
{​% endmacro %​}

Variables

There are two kinds

Jinja variables
- define and use them in Jinja
dbt variables
- dbt specific-variables which can be passed to dbt through the command line or the project YAML

Jinja variables

can be defined using set
e.g. create a macro in dbtlearn/macros/variables.sql

  
{​% macro learn_variables() %​}

    {​% set your_name_jinja = "FooBar" %​}
    {​{ log("Hello " ~ your_name_jinja, info=True) }​}

{​% endmacro %​}

dbt variables

aka Project variables
variables that dbt manages for you
pass their values from the command line or project YAML

  
{​% macro learn_variables() %​}

    {​{ log("Hello dbt user " ~ var("user_name") ~ "!", info=True)}​}

{​% endmacro %​}

in commandline, specify the username dbt run-operation learn_variables --vars "{user_name: foobar}"

Setting default values

can be set in var() but would need to set default value whenever you set the variable

  
{​% macro learn_variables() %​}
    {​{ log("Hello dbt user " ~ var("user_name", "NO USERNAME IS SET") ~ "!", info=True)}​}

{​% endmacro %​}

another way is to define the default value in project YAML file
- will overwrite default in macro if no user_name is passed in the command line

  
vars:
  user_name: default_user_name_for_this_project

Using data ranges to make incremental models production-ready

standard practice to use parameterization for data ranges to backfill
check variable existence {% if var("start_date", False) and var("end_date", False) %}
update dbtlearn/models/fct/fct_reviews.sql below

  
{​{
    config(
        materialized = 'incremental',
        on_schema_change = 'fail'
    )
}​}
with src_reviews as (
    select * from {​{ ref('src_reviews') }​}
)

select 
{​{ dbt_utils.generate_surrogate_key(['listing_id', 'review_date', 'reviewer_name', 'review_text'])}​} as review_id,
* from src_reviews
WHERE review_text is not null
{​% if is_incremental() %​}
  {​% if var("start_date", False) and var("end_date", False) %​}
    {​{ log('Loading ' ~ this ~ ' incrementally (start_date: ' ~ var("start_date") ~ ', end_date: ' ~ var("end_date") ~ ')', info=True) }​}
    AND review_date >= '{​{ var("start_date") }​}'
    AND review_date < '{​{ var("end_date") }​}'
  {​% else %​}
    AND review_date > (select max(review_date) from {​{ this }​})
    {​{ log('Loading ' ~ this ~ ' incrementally (all missing dates)', info=True)}​}
  {​% endif %​}
{​% endif %​}

if this is not an incremental load (full refresh), or if it is but has no start date and end date parameters, proceed as the naive implementation i.e.load review date src to fct model if review is not null and review date is after the max date
if this is an incremental load but has a start date and end date, then include extra conditions of having the review date in between start date and end date

dbt run --select fct_reviews

12:53:32  Loading AIRBNB.DEV.fct_reviews incrementally (all missing dates)

dbt run --select fct_reviews --vars '{start_date: "2024-02-15 00:00:00", end_date: "2024-03-15 23:59:59"}'

2:54:46  Loading AIRBNB.DEV.fct_reviews incrementally (start_date: 2024-02-15 00:00:00, end_date: 2024-03-15 23:59:59)

review compiled code dbtlearn/target/compiled/dbtlearn/models/fct/fct_reviews.sql and it would have the date ranges

  
-- code above
WHERE review_text is not null
    AND review_date >= '2024-02-15 00:00:00'
    AND review_date < '2024-03-15 23:59:59'
  

resouce Configure incremental models

Orchestrating dbt with dagster

Integrate with other orchestration tools

Airflow
- large community; many answers out there
- installation is a bit difficult
- does not really integrate well with dbt other than building the models
- manages the whole dbt workflow as a unit
Prefect
- more modern ETL tool
- easy to install; very Pythonic
- simple integration with dbt
- doesn’t have full-fledge integration although better than Airflow
Azure Data Factory
- point and click
- good product in itself but doesn’t really have tight integration with dbt
dbt Cloud
- proprietary offering from dbt Labs
dagster
- new generation scheduling tool

dagster

very tight dbt integration; understands dbt project well
data concept of dagster (asset) is similar to dbt (model)
great UI
easy to debug - will see exactly what dagster executed when you execute the workflow

dagster installation

create a requirements.txt and pip install -r requirements.txt into a virtual environment

dbt-snowflake
dagster-dbt
dagster-webserver

create a dagster project from existing dbt project

dagster-dbt project scaffold --project-name dbt_dagster_project --dbt-project-dir=dbtlearn

before launching dagster, make sure that dbt connections are ok and that the dependencies in dbt have been installed

cd dbtlearn
dbt debug
dbt deps

add a schedule into the dagster project dbt_dagster_project/dbt_dagster_project/schedules.py, remove the comment in schedules
start the dagster project

cd 'dbt_dagster_project'                                    
dagster dev

once server is running, go to http://127.0.0.1:3000/
manage dbt on dagster without touching dbt in production directly
dagster can also include data integration e.g. airbyte with minimal setup
can also connect a python based data source (from API)

Here are some resources about dagster

Advance Power User dbt core

Here are some advance features using the Power User dbt core extension.

To get the API key, sign up to Altimate.ai. The key is free with the free account sign up. UI will show the API key and instance name which need to be added to the extension in the settings file in VS Code.

With the AI integration, the features now include:

Generate documentation
Generate dbt model from source definition or SQL
Column level lineage

Generate dbt tests
Check the health of the dbt project
Query explanations
dbt project governance using the open source python package datapilot-cli
Query translation between SQL dialects

Introducing dbt to the company

As for introducing dbt to the company, I find these tips from the interview interesting:

Approach it as an experimentation rather than a business case. Experimentation means you’re going to try if it’s a good fit or not. This would also put less pressure on the team and give them time to learn the tool and find out if they like it or not – rather than introducing it as a business case wherein it’s presented as the solution.
Involve the business in workshops to give them opportunities to ask questions, and see if the solution is able to give answer and/or spark discussions. It helps to have a data steward that can bridge the engineering team with the business domain.
Act in stages. Stage 1 is all about the business perspective - it needs to address a problem; if successful go to Stage 2. This stage is the technical phase - how can dbt fit in the current stack, are additional infrastructure needed, are there added security measures or compliance issues etc.
Work on a POC e.g. migrate a data mart to dbt and evaluate the value from that; 3-months is a good timeline. Helps if the team can work with an external consultant proficient in dbt.

Characteristics of dbt that makes it easy to recommend

Python ecosystem; lightweight, easy to put into production (containerized) and integrate into CI/CD
Lineage, documentation as first class citizen in dbt makes it easy to own the pipelines (ownership involves being able to answer questions about where the data came from, how it’s transformed, where it was published )
Modular approach breaks down large stored proc into interconnected models which can make good points of discussions (about the business logic) and helps with maintenance
dbt abstracts the technicalities like parallelization, dependency of models, optimization of execution, so developers can focus on the logic and model building

dbt Certifications

There are currently two exams: one for analytics enginer, and one for cloud administrator. Details here.

Some tips from the interview:

It was useful to do the Udemy course as an introduction to dbt.
It helps if you’ve practiced dbt for a while before taking the exam, although not impossible to pass without it.
Do a lot of practice questions.
Questions include multiple choice, scenario-based, git, and dbt flags.
Most questions are about dbt core not dbt cloud.

Resources

Footnotes

learning-log

dbt

This post is licensed under CC BY 4.0 by the author.