GraphFrames is a package for Apache Spark that provides DataFrame-based graphs. It provides high-level APIs in Java, Python, and Scala. It aims to provide both the functionality of GraphX and extended functionality taking advantage of Spark DataFrames. This extended functionality includes motif finding, DataFrame-based serialization, and highly expressive graph queries.

This article includes two example notebooks: a Scala tutorial notebook and a Python user guide. For additional examples using GraphFrames with Scala, see GraphFrames user guide - Scala.

Databricks recommends using a cluster running Databricks Runtime for Machine Learning, as it includes an optimized installation of GraphFrames.

If you are not using a cluster running Databricks Runtime ML, download the JAR file from the GraphFrames library, load it to a volume, and install it onto your cluster.

GraphFrames tutorial (Scala)

The following notebook shows you how to use GraphFrames to perform graph analysis using Scala.

Graph Analysis with GraphFrames (Scala)

Open notebook in new tab

GraphFrames user guide (Python)

The following notebook includes Python code examples of how to use GraphFrames.

GraphFrames Python notebook

Open notebook in new tab