Testing for Databricks Connect for Scala
Note
This article covers Databricks Connect for Databricks Runtime 13.3 LTS and above.
This article describes how to run tests by using ScalaTest for Databricks Connect for Databricks Runtime 13.3 LTS and above. For more information about Databricks Connect, see Databricks Connect for Scala.
This information assumes that you have already installed Databricks Connect for Scala. See Install Databricks Connect for Scala.
You can run ScalaTest on local code that does not need a connection to a cluster in a remote Databricks workspace. For example, you might use ScalaTest to test your functions that accept and return DataFrame
objects in local memory. To get started with ScalaTest and run it locally, see Getting started in the ScalaTest documentation.
For example, given the following file src/main/scala/NYCTaxiFunctions.scala
containing a getSpark
function that returns a SparkSession
instance and a getTaxis
function that returns a DataFrame
representing the trips
table in the samples
catalog’s nyctaxi
schema:
NYCTaxiFunctions.scala
:
package org.example.application
import com.databricks.connect.DatabricksSession
import org.apache.spark.sql.{DataFrame, SparkSession}
class NYCTaxiFunctions {
def getSpark: SparkSession = {
DatabricksSession.builder().getOrCreate()
}
def getTaxis: DataFrame = {
val spark = getSpark
spark.read.table("samples.nyctaxi.trips")
}
}
And given the following file src/main/scala/Main.scala
that calls these getSpark
and getTaxis
functions:
Main.scala
:
package org.example.application
object Main {
def main(args: Array[String]): Unit = {
val nycTaxiFunctions = new NYCTaxiFunctions()
val df = nycTaxiFunctions.getTaxis
df.show(5)
}
}
The following file src/test/scala/NYCTaxiFunctionsTest.scala
tests whether the getSpark
function returns a SparkSession
instance and whether the getTaxis
function returns a DataFrame
that contains at least one row of data:
NYCTaxiFunctionsTest.scala
:
package org.example.application
import org.apache.spark.sql.SparkSession
import org.scalatest.flatspec.AnyFlatSpec
import org.scalatest.matchers.should.Matchers
class SparkSessionTypeTest extends AnyFlatSpec with Matchers {
"The session" should "be of type SparkSession" in {
val nycTaxiFunctions = new NYCTaxiFunctions()
val spark = nycTaxiFunctions.getSpark
spark shouldBe a [SparkSession]
}
}
class GetTaxisRowCountTest extends AnyFlatSpec with Matchers {
"The DataFrame" should "have at least one row" in {
val nycTaxiFunctions = new NYCTaxiFunctions()
val df = nycTaxiFunctions.getTaxis
df.count() should be > (0L)
}
}
To run these tests, see ScalaTest quick start or your IDE’s documentation. For example, for IntelliJ IDEA, see Test Scala applications in the IntelliJ IDEA documentation.