Connect to dbt Core
This artcle explains what is dbt, how to install dbt Core, and how to connect. The hosted version of dbt, called dbt Cloud is also available. For more information, see Connect to dbt Cloud.
What is dbt?
dbt (data build tool) is a development environment for transforming data by writing select statements. dbt turns these select statements into tables and views. dbt compiles your code into raw SQL and then runs that code on the specified database in Databricks. dbt supports collaborative coding patterns and best practices, including version control, documentation, and modularity.
dbt does not extract or load data. dbt focuses on the transformation step only, using a “transform after load” architecture. dbt assumes that you already have a copy of your data in your database.
dbt Core enables you to write dbt code in the IDE of your choice on your local development machine and then run dbt from the command line. dbt Core includes the dbt Command Line Interface (CLI). The dbt CLI is free to use and open source.
dbt Core (and dbt Cloud) can use hosted git repositories. For more information, see Creating a dbt project and Using an existing project on the dbt website.
Installation requirements
Before you install dbt Core, you must install the following on your local development machine:
You also need a personal access token to authenticate.
Note
As a security best practice, when you authenticate with automated tools, systems, scripts, and apps, Databricks recommends that you use personal access tokens belonging to service principals instead of workspace users. To create tokens for service principals, see Manage tokens for a service principal.
Step 1: Install the dbt Databricks adapter
We recommend using a Python virtual environment because it isolates package versions and code dependencies to that specific environment, regardless of the package versions and code dependencies in other environments. This helps reduce unexpected package version mismatches and code dependency collisions.
Databricks recommends version 1.8.0 or greater of the dbt-databricks package.
.. important:: If your local development machine uses any of the following operating systems, you must complete additional steps first: CentOS, MacOS, Ubuntu, Debian, and Windows. See the “Does my operating system have prerequisites” section of Use pip to install dbt on the dbt Labs website.
Step 2: Create a dbt project and specify and test connection settings
Create a dbt project (a collection of related directories and files required to use dbt). You then configure your connection profiles, which contain connection settings to a Databricks compute, a SQL warehouse, or both. To increase security, dbt projects and profiles are stored in separate locations by default.
With the virtual environment still activated, run the dbt init command with a name for your project. This procedure creates a project named
my_dbt_demo
.dbt init my_dbt_demo
When you are prompted to choose a
databricks
orspark
database, enter the number that corresponds todatabricks
.When prompted for a
host
value, do the following:For a compute, enter the Server Hostname value from the Advanced Options, JDBC/ODBC tab for your Databricks compute.
For a SQL warehouse, enter the Server Hostname value from the Connection Details tab for your SQL warehouse.
When prompted for an
http_path
value, do the following:For a compute, enter the HTTP Path value from the Advanced Options, JDBC/ODBC tab for your Databricks compute.
For a SQL warehouse, enter the HTTP Path value from the Connection Details tab for your SQL warehouse.
To choose an authentication type, enter the number that corresponds with
use access token
.Enter the value of your Databricks personal access token.
Note
As a security best practice, when you authenticate with automated tools, systems, scripts, and apps, Databricks recommends that you use personal access tokens belonging to service principals instead of workspace users. To create tokens for service principals, see Manage tokens for a service principal.
When prompted for the
desired Unity Catalog option
value, enter the number that corresponds withuse Unity Catalog
ornot use Unity Catalog
.If you chose to use Unity Catalog, enter the desired value for
catalog
when prompted.Enter the desired values for
schema
andthreads
when prompted.dbt writes your entries to a
profiles.yml
file. The location of this file is listed in the output of thedbt init
command. You can also list this location later by running thedbt debug --config-dir
command. You can open this file now to examine and verify its contents.Confirm that the connection details are correct by running the
dbt debug
command.Important
Before you begin, verify that your compute or SQL warehouse is running.
You should see output similar to the following:
dbt debug
... Configuration: profiles.yml file [OK found and valid] dbt_project.yml file [OK found and valid] Required dependencies: - git [OK found] Connection: ... Connection test: OK connection ok
Next steps
Create, run, and test dbt Core models locally. See the dbt Core tutorial.
Run dbt Core projects as Databricks job tasks. See Use dbt transformations in a Databricks job.