Git integration with Databricks Repos
Databricks Repos is a visual Git client in Databricks. It supports common Git operations such a cloning a repository, committing and pushing, pulling, branch management, and visual comparison of diffs when committing.
Within Repos you can develop code in notebooks or other files and follow data science and engineering code development best practices using Git for version control, collaboration, and CI/CD.
What can you do with Databricks Repos?
Databricks Repos provides source control for data and AI projects by integrating with Git providers.
In Databricks Repos, you can use Git functionality to:
Clone, push to, and pull from a remote Git repository.
Create and manage branches for development work.
Create notebooks, and edit notebooks and other files.
Visually compare differences upon commit.
For step-by-step instructions, see Clone a Git repo & other common Git operations. Databricks Repos also has an API that you can integrate with your CI/CD pipeline. For example, you can programmatically update a Databricks repo so that it always has the most recent version of the code. For information about best practices for code development using Databricks Repos, see CI/CD workflows with Git integration and Databricks Repos.
For following tasks, work in your Git provider:
Create a pull request.
Resolve merge conflicts.
Merge or delete branches.
Rebase a branch.
Supported Git providers
Databricks supports the following Git providers:
GitHub
Bitbucket Cloud
GitLab
Azure DevOps
AWS CodeCommit
GitHub AE
See Get a Git access token & connect a remote repo to Databricks.
Databricks Repos also supports Bitbucket Server, GitHub Enterprise Server, and GitLab self-managed integration, if the server is internet accessible. To integrate with a private Git server instance that is not internet-accessible, get in touch with your Databricks representative.
Support for arbitrary files in Databricks Repos is available in Databricks Runtime 8.4 and above. See What are workspace files?.