Testing for Databricks Connect for Python
Note
This article covers Databricks Connect for Databricks Runtime 13.3 LTS and above.
This article describes how to run tests by using pytest
for Databricks Connect for Databricks Runtime 13.3 LTS and above. For more information about Databricks Connect, see Databricks Connect for Python.
This information assumes that you have already installed Databricks Connect for Python. See Install Databricks Connect for Python.
You can run pytest on local code that does not need a connection to a cluster in a remote Databricks workspace. For example, you might use pytest
to test your functions that accept and return PySpark DataFrame
objects in local memory. To get started with pytest
and run it locally, see Get Started in the pytest
documentation.
For example, given the following file named nyctaxi_functions.py
containing a get_spark
function that returns a SparkSession
instance and a get_nyctaxi_trips
function that returns a DataFrame
representing the trips
table in the samples
catalog’s nyctaxi
schema:
nyctaxi_functions.py
:
from databricks.connect import DatabricksSession
from pyspark.sql import DataFrame, SparkSession
def get_spark() -> SparkSession:
spark = DatabricksSession.builder.getOrCreate()
return spark
def get_nyctaxi_trips() -> DataFrame:
spark = get_spark()
df = spark.read.table("samples.nyctaxi.trips")
return df
And given the following file named main.py
that calls these get_spark
and get_nyctaxi_trips
functions:
main.py
:
from nyctaxi_functions import *
df = get_nyctaxi_trips()
df.show(5)
The following file named test_nyctaxi_functions.py
tests whether the get_spark
function returns a SparkSession
instance and whether the get_nyctaxi_trips
function returns a DataFrame
that contains at least one row of data:
test_nyctaxi_functions.py
:
import pyspark.sql.connect.session
from nyctaxi_functions import *
def test_get_spark():
spark = get_spark()
assert isinstance(spark, pyspark.sql.connect.session.SparkSession)
def test_get_nyctaxi_trips():
df = get_nyctaxi_trips()
assert df.count() > 0
To run these tests, run the pytest
command from the code project’s root, which should produce test results similar to the following:
$ pytest
=================== test session starts ====================
platform darwin -- Python 3.11.7, pytest-8.1.1, pluggy-1.4.0
rootdir: <project-rootdir>
collected 2 items
test_nyctaxi_functions.py .. [100%]
======================== 2 passed ==========================