Glow

Glow is an open source project created in collaboration between Databricks and the Regeneron Genetics Center. For information on features in Glow, see the Glow documentation.

Install Glow on a Databricks cluster via Docker with Databricks Container Services.

You can find containers on the ProjectGlow Dockerhub page. These setup environments with Glow and other libraries that were in Databricks Runtime for Genomics (deprecated). Use projectglow/databricks-glow:<databricks_runtime_version>, replacing the tag with an available Databricks Runtime version.

Or install both of these cluster libraries:

  • Maven: io.projectglow:glow-spark3_2.12:<version>
  • PyPI: glow.py==<version>

Important

  • Please install the latest version of Glow on Databricks Runtime, not Databricks Runtime for Genomics (deprecated), which has Glow v0.6 installed by default.
  • Please install the Glow PyPi package as a cluster library, not as a notebook-scoped library using the %pip magic command.

Tip

  • Use compute optimized virtual machines to read variant data from cloud object stores.
  • Use delta cache accelerated virtual machines to query variant data.
  • Use memory optimized virtual machines for genetic association studies.
    • Clusters with small machines have a better price-performance ratio relative to large machines.
  • The Glow Pipe Transformer supports parallelization of deep learning tools that run on GPUs.