Limitations with Databricks Connect for Python

Note

This article covers Databricks Connect for Databricks Runtime 13.0 and above.

This article lists limitations with Databricks Connect for Python. Databricks Connect enables you to connect popular IDEs, notebook servers, and custom applications to Databricks clusters. See What is Databricks Connect?. For the Scala version of this article, see Limitations with Databricks Connect for Scala.

Not available on Databricks Connect for Databricks Runtime 13.3 LTS and below:

  • Streaming foreachBatch

  • Creating DataFrames larger than 128 MB

  • Long queries over 3600 seconds

Not available on Databricks Connect for Databricks Runtime 13.0:

  • UDFs

  • Pandas UDFs

  • Pandas on Spark

  • Streaming (without foreachBatch)

  • Databricks Utilities: fs, ls and secrets

  • OAuth

  • ApplyinPandas() and Cogroup() with single-user clusters

Not available:

  • Dataset API

  • Dataset typed APIs (such as reduce() and flatMap())

  • Databricks Utilities: credentials, library, notebook workflow, widgets

  • SparkContext

  • RDDs

  • MLflow model inference: pyfunc.spark_udf() API

  • Mosaic geospatial

  • CREATE TABLE <table-name> AS SELECT (instead, use spark.sql("SELECT ...").write.saveAsTable("table"))

  • ApplyinPandas() and Cogroup() with shared clusters

  • Changing the log4j log level through SparkContext

  • Distributed ML training

  • Synchronizing the local development environment with the remote cluster