Start a Ray cluster on Databricks

Databricks simplifies the process of starting a Ray cluster by handling cluster and job configuration the same way it does with any Apache Spark job. This is because the Ray cluster is actually started on top of the managed Apache Spark cluster.

Run Ray on a local machine

import ray

ray.init()

Run Ray on Databricks

from ray.util.spark import setup_ray_cluster
import ray

# If the cluster has four workers with 8 CPUs each as an example
setup_ray_cluster(num_worker_nodes=4, num_cpus_per_worker=8)

# Pass any custom configuration to ray.init
ray.init(ignore_reinit_error=True)

This approach works at any cluster scale, from a few to hundreds of nodes. Ray clusters on Databricks also support autoscaling.

After creating the Ray cluster, you can run any Ray application code in a Databricks notebook.

Important

Databricks recommends installing any necessary libraries for your application with %pip install <your-library-dependency> to ensure they are available to your Ray cluster and application accordingly. Specifying dependencies in the Ray init function call installs the dependencies in a location inaccessible to the Apache Spark worker nodes, which results in version incompatibilities and import errors.

For example, you can run a simple Ray application in a Databricks notebook as follows:

import ray
import random
import time
from fractions import Fraction

ray.init()

@ray.remote
def pi4_sample(sample_count):
    """pi4_sample runs sample_count experiments, and returns the
    fraction of time it was inside the circle.
    """
    in_count = 0
    for i in range(sample_count):
        x = random.random()
        y = random.random()
        if x*x + y*y <= 1:
            in_count += 1
    return Fraction(in_count, sample_count)

SAMPLE_COUNT = 1000 * 1000
start = time.time()
future = pi4_sample.remote(sample_count=SAMPLE_COUNT)
pi4 = ray.get(future)
end = time.time()
dur = end - start
print(f'Running {SAMPLE_COUNT} tests took {dur} seconds')

pi = pi4 * 4
print(float(pi))

Shut down a Ray cluster

Ray clusters automatically shut down under the following circumstances:

  • You detach your interactive notebook from your Databricks cluster.

  • Your Databricks job is completed.

  • Your Databricks cluster is restarted or terminated.

  • There’s no activity for the specified idle time.

To shut down a Ray cluster running on Databricks, you can call the ray.utils.spark.shutdown_ray_cluster API.

from ray.utils.spark import shutdown_ray_cluster
import ray

shutdown_ray_cluster()
ray.shutdown()