User-defined functions (UDFs) in Unity Catalog
Preview
This feature is in Public Preview.
Databricks provides a SQL-native syntax to register custom functions to schemas governed by Unity Catalog. Python UDFs registered as functions in Unity Catalog differ in scope and support from PySpark UDFs scoped to a notebook or SparkSession. See User-defined scalar functions - Python.
For the complete SQL language reference, see CREATE FUNCTION (SQL and Python).
For information about how Unity Catalog manages permissions on functions, see CREATE FUNCTION.
Requirements
Databricks Runtime 14.1 or above.
To use Python code in UDFs that are registered in Unity Catalog, you must use a pro SQL warehouse or a cluster running Databricks Runtime 14.1 or above.
To resolve views that were created using a UDF registered to Unity Catalog, you must use Databricks Runtime 14.1 or above. You cannot use a SQL warehouse.
Custom SQL functions in Unity Catalog
When you create a SQL function using compute configured for Unity Catalog, the function is registered to the currently active schema by default. The following example demonstrates the syntax you might use to declare a target catalog and schema for a new function:
CREATE FUNCTION target_catalog.target_schema.roll_dice()
RETURNS INT
LANGUAGE SQL
NOT DETERMINISTIC
CONTAINS SQL
COMMENT 'Roll a single 6 sided die'
RETURN (rand() * 6)::INT + 1;
All users with sufficient privileges on the function can then use the function in compute environments configured for Unity Catalog, as in the following example:
SELECT target_catalog.target_schema.roll_dice()
Note
You can use UDFs using LANGUAGE SQL
to return tables or scalar values.
Register a Python UDF to Unity Catalog
In Databricks Runtime 14.1 and above, you can use the SQL CREATE FUNCTION
statement to register scalar Python UDFs to Unity Catalog.
Important
If you want to use a SQL warehouse to register Python UDFs in Unity Catalog, it must be a Pro or Serverless SQL warehouse.
Python UDFs are designed to provide the full expressiveness of Python directly within SQL functions, allowing for customized operations such as advanced transformations, data masking, and hashing.
Python UDFs execute in a secure, isolated environment and do not have access to file systems or internal services.
Python UDFs running in shared access mode allow TCP/UDP network traffic over ports 80, 443, and 53.
See Which UDFs are most efficient?.
Note
Syntax and semantics for Python UDFs in Unity Catalog differ from Python UDFs registered to the SparkSession. See User-defined scalar functions - Python.
Python UDFs for Unity Catalog use statements set off by double dollar signs ($$
), as in the following code example:
CREATE FUNCTION target_catalog.target_schema.greet(s STRING)
RETURNS STRING
LANGUAGE PYTHON
AS $$
return f"Hello, {s}"
$$
The following example demonstrates using this function to return greeting statements for all names stored in the first_name
column of a table named students
:
SELECT target_catalog.target_schema.greet(first_name)
FROM students;
You can define any number of Python functions within a Python UDF, but must return a scalar value.
Python functions must handle NULL
values independently, and all type mappings must follow Databricks SQL language mappings.
You can import standard Python libraries included by Databricks, but you cannot include custom libraries or external dependencies.
If no catalog or schema is specified, Python UDFs are registered to the current active schema.
The following example imports a library and uses multiple functions within a Python UDF:
CREATE FUNCTION roll_dice(num_dice INTEGER, num_sides INTEGER)
RETURNS INTEGER
LANGUAGE PYTHON
AS $$
import numpy as np
def roll_die(num_sides):
return np.random.randint(num_sides) + 1
def sum_dice(num_dice,num_sides):
return sum([roll_die(num_sides) for x in range(num_dice)])
return sum_dice(num_dice, num_sides)
$$