Hail is a library built on Apache Spark for analyzing large genomic datasets.


  • From Hail 0.2.65 onwards, use Apache Spark version 3.1 (Databricks Runtime 8.x or 9.x)
  • Install Hail on Databricks Runtime, not Databricks Runtime for Genomics (deprecated).
  • Hail is not supported with Glow, except when exporting from Hail to Glow

Create a cluster

Install Hail via Docker with Databricks Container Services.

For containers to set up a Hail environment, see the ProjectGlow Dockerhub page. Use projectglow/databricks-hail:<hail_version>, replacing the tag with an available Hail version.

Use Hail in a notebook

For the most part, Hail in Databricks works identically to the Hail documentation. However, there are a few modifications that are necessary for the Databricks environment.

Initialize Hail

When initializing Hail, pass in the pre-created SparkContext and mark the initialization as idempotent. This setting enables multiple Databricks notebooks to use the same Hail context.


Enable skip_logging_configuration to save logs to the rolling driver log4j output. This setting is supported only in Hail 0.2.39 and above.

import hail as hl
hl.init(sc, idempotent=True, quiet=True, skip_logging_configuration=True)

Display Bokeh plots

Hail uses the Bokeh library to create plots. The show function built into Bokeh does not work in Databricks. To display a Bokeh plot generated by Hail, you can run a command like:

from bokeh.embed import components, file_html
from bokeh.resources import CDN
plot = hl.plot.histogram(mt.DP, range=(0,30), bins=30, title='DP Histogram', legend='DP')
html = file_html(plot, CDN, "Chart")

See Bokeh for more information.