Syntax overview
dbt's node selection syntax makes it possible to run only specific resources in a given invocation of dbt. This selection syntax is used for the following subcommands:
command | argument(s) |
---|---|
run | --select , --exclude , --selector , --defer |
test | --select , --exclude , --selector , --defer |
seed | --select , --exclude , --selector |
snapshot | --select , --exclude --selector |
ls (list) | --select , --exclude , --selector , --resource-type |
compile | --select , --exclude , --selector , --inline |
freshness | --select , --exclude , --selector |
build | --select , --exclude , --selector , --resource-type , --defer |
docs generate | --select , --exclude , --selector |
We use the terms "nodes" and "resources" interchangeably. These encompass all the models, tests, sources, seeds, snapshots, exposures, and analyses in your project. They are the objects that make up dbt's DAG (directed acyclic graph).
The --select
and --selector
arguments sound similar, but they are different. To understand the difference, see Differences between --select
and --selector
.
Specifying resources
By default, dbt run
executes all of the models in the dependency graph; dbt seed
creates all seeds, dbt snapshot
performs every snapshot. The --select
flag is used to specify a subset of nodes to execute.
To follow POSIX standards and make things easier to understand, we recommend CLI users use quotes when passing arguments to the --select
or --exclude
option (including single or multiple space-delimited, or comma-delimited arguments). Not using quotes might not work reliably on all operating systems, terminals, and user interfaces. For example, dbt run --select "my_dbt_project_name"
runs all models in your project.
How does selection work?
-
dbt gathers all the resources that are matched by one or more of the
--select
criteria, in the order of selection methods (e.g.tag:
), then graph operators (e.g.+
), then finally set operators (unions, intersections, exclusions). -
The selected resources may be models, sources, seeds, snapshots, tests. (Tests can also be selected "indirectly" via their parents; see test selection examples for details.)
-
dbt now has a list of still-selected resources of varying types. As a final step, it tosses away any resource that does not match the resource type of the current task. (Only seeds are kept for
dbt seed
, only models fordbt run
, only tests fordbt test
, and so on.)
Shorthand
Select resources to build (run, test, seed, snapshot) or check freshness: --select
, -s
Examples
By default, dbt run
will execute all of the models in the dependency graph. During development (and deployment), it is useful to specify only a subset of models to run. Use the --select
flag with dbt run
to select a subset of models to run. Note that the following arguments (--select
, --exclude
, and --selector
) also apply to other dbt tasks, such as test
and build
.
- Examples of select flag
- Examples of subsets of nodes
The --select
flag accepts one or more arguments. Each argument can be one of:
- a package name
- a model name
- a fully-qualified path to a directory of models
- a selection method (
path:
,tag:
,config:
,test_type:
,test_name:
)
Examples:
dbt run --select "my_dbt_project_name" # runs all models in your project
dbt run --select "my_dbt_model" # runs a specific model
dbt run --select "path/to/my/models" # runs all models in a specific directory
dbt run --select "my_package.some_model" # run a specific model in a specific package
dbt run --select "tag:nightly" # run models with the "nightly" tag
dbt run --select "path/to/models" # run models contained in path/to/models
dbt run --select "path/to/my_model.sql" # run a specific model by its path
dbt supports a shorthand language for defining subsets of nodes. This language uses the following characters:
Examples:
# multiple arguments can be provided to --select
dbt run --select "my_first_model my_second_model"
# select my_model and all of its children
dbt run --select "my_model+"
# select my_model, its children, and the parents of its children
dbt run --models @my_model
# these arguments can be projects, models, directory paths, tags, or sources
dbt run --select "tag:nightly my_model finance.base.*"
# use methods and intersections for more complex selectors
dbt run --select "path:marts/finance,tag:nightly,config.materialized:table"
As your selection logic gets more complex, and becomes unwieldly to type out as command-line arguments,
consider using a yaml selector. You can use a predefined definition with the --selector
flag.
Note that when you're using --selector
, most other flags (namely --select
and --exclude
) will be ignored.
To understand the difference between --select
and --selector
arguments, see this section for more details.
Troubleshoot with the ls
command
Constructing and debugging your selection syntax can be challenging. To get a "preview" of what will be selected, we recommend using the list
command. This command, when combined with your selection syntax, will output a list of the nodes that meet that selection criteria. The dbt ls
command supports all types of selection syntax arguments, for example:
dbt ls --select "path/to/my/models" # Lists all models in a specific directory.
dbt ls --select "source_status:fresher+" # Shows sources updated since the last dbt source freshness run.
dbt ls --select state:modified+ # Displays nodes modified in comparison to a previous state.
dbt ls --select "result:<status>+" state:modified+ --state ./<dbt-artifact-path> # Lists nodes that match certain result statuses and are modified.
Questions from the Community
State selection
One of the greatest underlying assumptions about dbt is that its operations should be stateless and idempotent. That is, it doesn't matter how many times a model has been run before, or if it has ever been run before. It doesn't matter if you run it once or a thousand times. Given the same raw data, you can expect the same transformed result. A given run of dbt doesn't need to "know" about any other run; it just needs to know about the code in the project and the objects in your database as they exist right now.
That said, dbt does store "state" — a detailed, point-in-time view of project resources (also referred to as nodes), database objects, and invocation results — in the form of its artifacts. If you choose, dbt can use these artifacts to inform certain operations. Crucially, the operations themselves are still stateless and idempotent: given the same manifest and the same raw data, dbt will produce the same transformed result.
dbt can leverage artifacts from a prior invocation as long as their file path is passed to the --state
flag. This is a prerequisite for:
- The
state
selector, whereby dbt can identify resources that are new or modified by comparing code in the current project against the state manifest. - Deferring to another environment, whereby dbt can identify upstream, unselected resources that don't exist in your current environment and instead "defer" their references to the environment provided by the state manifest.
- The
dbt clone
command, whereby dbt can clone nodes based on their location in the manifest provided to the--state
flag.
Together, the state
selector and deferral enable "slim CI". We expect to add more features in future releases that can leverage artifacts passed to the --state
flag.
Establishing state
State and defer can be set by environment variables as well as CLI flags:
--state
orDBT_STATE
: file path--defer
orDBT_DEFER
: boolean--defer-state
orDBT_DEFER_STATE
: file path to use for deferral only (optional)
If --defer-state
is not specified, deferral will use the artifacts supplied by --state
. This enables more granular control in cases where you want to compare against logical state from one environment or past point in time, and defer to applied state from a different environment or point in time.
If both the flag and env var are provided, the flag takes precedence.
Notes:
- The
--state
artifacts must be of schema versions that are compatible with the currently running dbt version. - These are powerful, complex features. Read about known caveats and limitations to state comparison.
In dbt v1.5, we deprecated the original syntax for state (DBT_ARTIFACT_STATE_PATH
) and defer (DBT_DEFER_TO_STATE
). Although dbt supports backward compatibility with the old syntax, we will remove it in a future release that we have not yet determined.
The "result" status
Another element of job state is the result
of a prior dbt invocation. After executing a dbt run
, for example, dbt creates the run_results.json
artifact which contains execution times and success / error status for dbt models. You can read more about run_results.json
on the 'run results' page.
The following dbt commands produce run_results.json
artifacts whose results can be referenced in subsequent dbt invocations:
dbt run
dbt test
dbt build
(new in dbt version v0.21.0)dbt seed
After issuing one of the above commands, you can reference the results by adding a selector to a subsequent command as follows:
# You can also set the DBT_STATE environment variable instead of the --state flag.
dbt run --select "result:<status>" --defer --state path/to/prod/artifacts
The available options depend on the resource (node) type:
result:\<status> | model | seed | snapshot | test |
---|---|---|---|---|
result:error | ✅ | ✅ | ✅ | ✅ |
result:success | ✅ | ✅ | ✅ | |
result:skipped | ✅ | ✅ | ✅ | |
result:fail | ✅ | |||
result:warn | ✅ | |||
result:pass | ✅ |
Combining state
and result
selectors
The state and result selectors can also be combined in a single invocation of dbt to capture errors from a previous run OR any new or modified models.
dbt run --select "result:<status>+" state:modified+ --defer --state ./<dbt-artifact-path>
The "source_status" status
Another element of job state is the source_status
of a prior dbt invocation. After executing dbt source freshness
, for example, dbt creates the sources.json
artifact which contains execution times and max_loaded_at
dates for dbt sources. You can read more about sources.json
on the 'sources' page.
The dbt source freshness
command produces a sources.json
artifact whose results can be referenced in subsequent dbt invocations.
When a job is selected, dbt Cloud will surface the artifacts from that job's most recent successful run. dbt will then use those artifacts to determine the set of fresh sources. In your job commands, you can signal dbt to run and test only on the fresher sources and their children by including the source_status:fresher+
argument. This requires both the previous and current states to have the sources.json
artifact available. Or plainly said, both job states need to run dbt source freshness
.
After issuing the dbt source freshness
command, you can reference the source freshness results by adding a selector to a subsequent command:
# You can also set the DBT_STATE environment variable instead of the --state flag.
dbt source freshness # must be run again to compare current to previous state
dbt build --select "source_status:fresher+" --state path/to/prod/artifacts
For more example commands, refer to Pro-tips for workflows.