Testing for Databricks Connect for Scala

Note

This article covers Databricks Connect for Databricks Runtime 13.3 LTS and above.

This article describes how to run tests by using ScalaTest for Databricks Connect for Databricks Runtime 13.3 LTS and above. For more information about Databricks Connect, see Databricks Connect for Scala.

This information assumes that you have already installed Databricks Connect for Scala. See Install Databricks Connect for Scala.

You can run ScalaTest on local code that does not need a connection to a cluster in a remote Databricks workspace. For example, you might use ScalaTest to test your functions that accept and return DataFrame objects in local memory. To get started with ScalaTest and run it locally, see Getting started in the ScalaTest documentation.

For example, given the following file src/main/scala/NYCTaxiFunctions.scala containing a getSpark function that returns a SparkSession instance and a getTaxis function that returns a DataFrame representing the trips table in the samples catalog’s nyctaxi schema:

NYCTaxiFunctions.scala:

package org.example.application

import com.databricks.connect.DatabricksSession
import org.apache.spark.sql.{DataFrame, SparkSession}

class NYCTaxiFunctions {
  def getSpark: SparkSession = {
    DatabricksSession.builder().getOrCreate()
  }

  def getTaxis: DataFrame = {
    val spark = getSpark
    spark.read.table("samples.nyctaxi.trips")
  }
}

And given the following file src/main/scala/Main.scala that calls these getSpark and getTaxis functions:

Main.scala:

package org.example.application

object Main {
  def main(args: Array[String]): Unit = {
    val nycTaxiFunctions = new NYCTaxiFunctions()
    val df = nycTaxiFunctions.getTaxis

    df.show(5)
  }
}

The following file src/test/scala/NYCTaxiFunctionsTest.scala tests whether the getSpark function returns a SparkSession instance and whether the getTaxis function returns a DataFrame that contains at least one row of data:

NYCTaxiFunctionsTest.scala:

package org.example.application

import org.apache.spark.sql.SparkSession
import org.scalatest.flatspec.AnyFlatSpec
import org.scalatest.matchers.should.Matchers

class SparkSessionTypeTest extends AnyFlatSpec with Matchers {
  "The session" should "be of type SparkSession" in {
    val nycTaxiFunctions = new NYCTaxiFunctions()
    val spark = nycTaxiFunctions.getSpark
    spark shouldBe a [SparkSession]
  }
}

class GetTaxisRowCountTest extends AnyFlatSpec with Matchers {
  "The DataFrame" should "have at least one row" in {
    val nycTaxiFunctions = new NYCTaxiFunctions()
    val df = nycTaxiFunctions.getTaxis
    df.count() should be > (0L)
  }
}

To run these tests, see ScalaTest quick start or your IDE’s documentation. For example, for IntelliJ IDEA, see Test Scala applications in the IntelliJ IDEA documentation.