Databricks runtime maintenance updates

This page lists maintenance updates issued for Databricks Runtime releases. To add a maintenance update to an existing cluster, restart the cluster.

Note

This list of maintenance updates may include references to features that are not available on Google Cloud.

Databricks Runtime releases

For the original release notes, follow the link below the subheading.

Databricks Runtime 10.0

See Databricks Runtime 10.0 and Databricks Runtime 10.0 Photon.

  • Nov 9, 2021
    • Introduced additional configuration flags to enable fine grained control of ANSI behaviors.
  • Nov 4, 2021
    • Fixed a bug that could cause Structured Streaming streams to fail with an ArrayIndexOutOfBoundsException
    • Fixed a race condition that might cause a query failure with an IOException like java.io.IOException: No FileSystem for scheme or that might cause modifications to sparkContext.hadoopConfiguration to not take effect in queries.
    • The Apache Spark Connector for Delta Sharing was upgraded to 0.2.0.
  • Nov 30, 2021
    • Fixed an issue with timestamp parsing where a timezone string without a colon was considered invalid.
    • Fixed an out of memory issue with query result caching under certain conditions.
    • Fixed an issue with USE DATABASE when a user switches the current catalog to a non-default catalog.

Databricks Runtime 9.1 LTS

See Databricks Runtime 9.1 LTS and Databricks Runtime 9.1 LTS Photon.

  • Nov 4, 2021
    • Fixed a bug that could cause Structured Streaming streams to fail with an ArrayIndexOutOfBoundsException
    • Fixed a race condition that might cause a query failure with an IOException like java.io.IOException: No FileSystem for scheme or that might cause modifications to sparkContext.hadoopConfiguration to not take effect in queries.
    • The Apache Spark Connector for Delta Sharing was upgraded to 0.2.0.
  • Oct 20, 2021
    • Upgraded BigQuery connector from 0.18.1 to 0.22.2. This adds support for BigNumeric type.

Databricks Runtime 9.0

See Databricks Runtime 9.0 and Databricks Runtime 9.0 Photon.

  • Nov 4, 2021
    • Fixed a bug that could cause Structured Streaming streams to fail with an ArrayIndexOutOfBoundsException
    • Fixed a race condition that might cause a query failure with an IOException like java.io.IOException: No FileSystem for scheme or that might cause modifications to sparkContext.hadoopConfiguration to not take effect in queries.
    • The Apache Spark Connector for Delta Sharing was upgraded to 0.2.0.
  • Sep 22, 2021
    • Fixed a bug in cast Spark array with null to string
  • Sep 15, 2021
    • Fixed a race condition that might cause a query failure with an IOException like java.io.IOException: org.apache.spark.SparkException: Failed to get broadcast_x_piecey of broadcast_x.
  • Sep 8, 2021
    • Added support for schema name (databaseName.schemaName.tableName format) as the target table name for Azure Synapse Connector.
    • Added geometry and geography JDBC types support for Spark SQL.
    • [SPARK-33527][SQL] Extended the function of decode to be consistent with mainstream databases.
    • [SPARK-36532][CORE][3.1] Fixed deadlock in ``CoarseGrainedExecutorBackend.onDisconnected` to avoid executorsconnected to prevent executor shutdown hang.
  • Aug 25, 2021
    • SQL Server driver library was upgraded to 9.2.1.jre8.
    • Snowflake connector was upgraded to 2.9.0.
    • Fixed broken link to best trial notebook on AutoML experiment page.

Databricks Runtime 8.4

See Databricks Runtime 8.4 and Databricks Runtime 8.4 Photon.

  • Nov 4, 2021
    • Fixed a bug that could cause Structured Streaming streams to fail with an ArrayIndexOutOfBoundsException
    • Fixed a race condition that might cause a query failure with an IOException like java.io.IOException: No FileSystem for scheme or that might cause modifications to sparkContext.hadoopConfiguration to not take effect in queries.
    • The Apache Spark Connector for Delta Sharing was upgraded to 0.2.0.
  • Sep 22, 2021
    • Spark JDBC driver was upgraded to 2.6.19.1030
    • [SPARK-36734][SQL] Upgrade ORC to 1.5.1
  • Sep 15, 2021
    • Fixed a race condition that might cause a query failure with an IOException like java.io.IOException: org.apache.spark.SparkException: Failed to get broadcast_x_piecey of broadcast_x.
    • Operating system security updates.
  • Sep 8, 2021
    • [SPARK-36532][CORE][3.1] Fixed deadlock in ``CoarseGrainedExecutorBackend.onDisconnected` to avoid executorsconnected to prevent executor shutdown hang.
  • Aug 25, 2021
    • SQL Server driver library was upgraded to 9.2.1.jre8.
    • Snowflake connector was upgraded to 2.9.0.
    • Fixes a bug in credential passthrough caused by the new Parquet prefetch optimization, where user’s passthrough credential might not be found during file access.
  • Aug 11, 2021
    • Fixes a RocksDB incompatibility problem that prevents older Databricks Runtime 8.4. This fixes forward compatibility for Auto Loader, COPY INTO, and stateful streaming applications.
    • Fixes a bug in Auto Loader with S3 paths when using Auto Loader without a path option.
    • Fixes a bug that misconfigured AWS STS endpoints as Amazon Kinesis endpoints for the Kinesis source.
    • Fixes a bug when using Auto Loader to read CSV files with mismatching header files. If column names do not match, the column would be filled in with nulls. Now, if a schema is provided, it assumes the schema is the same and will only save column mismatches if rescued data columns are enabled.
    • Adds a new option called externalDataSource into the Azure Synapse connector to remove the CONTROL permission requirement on the database for PolyBase reading.
  • Jul 29, 2021
    • [SPARK-36034][BUILD] Rebase datetime in pushed down filters to Parquet
    • [SPARK-36163][BUILD] Propagate correct JDBC properties in JDBC connector provider and add connectionProvider option

Databricks Runtime 8.3

See Databricks Runtime 8.3 and Databricks Runtime 8.3 Photon.

  • Nov 4, 2021
    • Fixed a bug that could cause Structured Streaming streams to fail with an ArrayIndexOutOfBoundsException
    • Fixed a race condition that might cause a query failure with an IOException like java.io.IOException: No FileSystem for scheme or that might cause modifications to sparkContext.hadoopConfiguration to not take effect in queries.
  • Sep 22, 2021
    • Spark JDBC driver was upgraded to 2.6.19.1030
  • Sep 15, 2021
    • Fixed a race condition that might cause a query failure with an IOException like java.io.IOException: org.apache.spark.SparkException: Failed to get broadcast_x_piecey of broadcast_x.
    • Operating system security updates.
  • Sep 8, 2021
    • [SPARK-35700][SQL][WARMFIX] Read char/varchar orc table when created and written by external systems.
    • [SPARK-36532][CORE][3.1] Fixed deadlock in ``CoarseGrainedExecutorBackend.onDisconnected` to avoid executorsconnected to prevent executor shutdown hang.
  • Aug 25, 2021
    • SQL Server driver library was upgraded to 9.2.1.jre8.
    • Snowflake connector was upgraded to 2.9.0.
    • Fixes a bug in credential passthrough caused by the new Parquet prefetch optimization, where user’s passthrough credential might not be found during file access.
  • Aug 11, 2021
    • Fixes a bug that misconfigured AWS STS endpoints as Amazon Kinesis endpoints for the Kinesis source.
    • Fixes a bug when using Auto Loader to read CSV files with mismatching header files. If column names do not match, the column would be filled in with nulls. Now, if a schema is provided, it assumes the schema is the same and will only save column mismatches if rescued data columns are enabled.
  • Jul 29, 2021
    • Upgrade Databricks Snowflake Spark connector to 2.9.0-spark-3.1
    • [SPARK-36034][BUILD] Rebase datetime in pushed down filters to Parquet
    • [SPARK-36163][BUILD] Propagate correct JDBC properties in JDBC connector provider and add connectionProvider option
  • Jul 14, 2021
    • Fixed an issue when using column names with dots in Azure Synapse connector.
    • Introduced database.schema.table format for Synapse Connector.
    • Added support to provide databaseName.schemaName.tableName format as the target table instead of only schemaName.tableName or tableName.
  • Jun 15, 2021
    • Fixed a NoSuchElementException bug in Delta Lake optimized writes that can happen when writing large amounts of data and encountering executor losses
    • Adds SQL CREATE GROUP, DROP GROUP, ALTER GROUP, SHOW GROUPS, and SHOW USERS commands. For details, see Security statements and Show statements.

Databricks Runtime 8.2 (Unsupported)

See Databricks Runtime 8.2 (Unsupported).

  • Sep 22, 2021
    • Operating system security updates.
  • Sep 15, 2021
    • Fixed a race condition that might cause a query failure with an IOException like java.io.IOException: org.apache.spark.SparkException: Failed to get broadcast_x_piecey of broadcast_x.
  • Sep 8, 2021
    • [SPARK-35700][SQL][WARMFIX] Read char/varchar orc table when created and written by external systems.
    • [SPARK-36532][CORE][3.1] Fixed deadlock in ``CoarseGrainedExecutorBackend.onDisconnected` to avoid executorsconnected to prevent executor shutdown hang.
  • Aug 25, 2021
    • Snowflake connector was upgraded to 2.9.0.
  • Aug 11, 2021
    • Fixes a bug that misconfigured AWS STS endpoints as Amazon Kinesis endpoints for the Kinesis source.
    • [SPARK-36034][SQL] Rebase datetime in pushed down filters to parquet.
  • Jul 29, 2021
    • Upgrade Databricks Snowflake Spark connector to 2.9.0-spark-3.1
    • [SPARK-36163][BUILD] Propagate correct JDBC properties in JDBC connector provider and add connectionProvider option
  • Jul 14, 2021
    • Fixed an issue when using column names with dots in Azure Synapse connector.
    • Introduced database.schema.table format for Synapse Connector.
    • Added support to provide databaseName.schemaName.tableName format as the target table instead of only schemaName.tableName or tableName.
    • Fixed a bug that prevents users from time traveling to older available versions with Delta tables.
  • Jun 15, 2021
    • Fixes a NoSuchElementException bug in Delta Lake optimized writes that can happen when writing large amounts of data and encountering executor losses
  • May 26, 2021
    • Updated Python with security patch to fix Python security vulnerability (CVE-2021-3177).
    • Delta cache is enabled by default on all GCP instances except those in the -highcpu- family. For -highcpu- instances, the cache is preconfigured but disabled by default. It can be enabled using the spark confing spark.databricks.io.cache.enabled true.
  • Apr 30, 2021
    • Operating system security updates.
    • [SPARK-35227][BUILD] Update the resolver for spark-packages in SparkSubmit
    • [SPARK-34245][CORE] Ensure Master removes executors that failed to send finished state
    • Fixed an OOM issue when Auto Loader reports Structured Streaming progress metrics.

Databricks Runtime 8.1 (Unsupported)

See Databricks Runtime 8.1 (Unsupported).

  • Sep 22, 2021
    • Operating system security updates.
  • Sep 15, 2021
    • Fixed a race condition that might cause a query failure with an IOException like java.io.IOException: org.apache.spark.SparkException: Failed to get broadcast_x_piecey of broadcast_x.
  • Sep 8, 2021
    • [SPARK-35700][SQL][WARMFIX] Read char/varchar orc table when created and written by external systems.
    • [SPARK-36532][CORE][3.1] Fixed deadlock in ``CoarseGrainedExecutorBackend.onDisconnected` to avoid executorsconnected to prevent executor shutdown hang.
  • Aug 25, 2021
    • Snowflake connector was upgraded to 2.9.0.
  • Aug 11, 2021
    • Fixes a bug that misconfigured AWS STS endpoints as Amazon Kinesis endpoints for the Kinesis source.
    • [SPARK-36034][SQL] Rebase datetime in pushed down filters to parquet.
  • Jul 29, 2021
    • Upgrade Databricks Snowflake Spark connector to 2.9.0-spark-3.1
    • [SPARK-36163][BUILD] Propagate correct JDBC properties in JDBC connector provider and add connectionProvider option
  • Jul 14, 2021
    • Fixed an issue when using column names with dots in Azure Synapse connector.
    • Fixed a bug that prevents users from time traveling to older available versions with Delta tables.
  • Jun 15, 2021
    • Fixes a NoSuchElementException bug in Delta Lake optimized writes that can happen when writing large amounts of data and encountering executor losses
  • May 26, 2021
    • Updated Python with security patch to fix Python security vulnerability (CVE-2021-3177).
    • Delta cache is enabled by default on all GCP instances except those in the -highcpu- family. For -highcpu- instances, the cache is preconfigured but disabled by default. It can be enabled using the spark confing spark.databricks.io.cache.enabled true.
  • Apr 30, 2021
    • Operating system security updates.
    • [SPARK-35227][BUILD] Update the resolver for spark-packages in SparkSubmit
    • Fixed an OOM issue when Auto Loader reports Structured Streaming progress metrics.
  • Apr 27, 2021
    • [SPARK-34245][CORE] Ensure Master removes executors that failed to send finished state
    • [SPARK-34856][SQL] ANSI mode: Allow casting complex types as string type
    • [SPARK-35014] Fix the PhysicalAggregation pattern to not rewrite foldable expressions
    • [SPARK-34769][SQL] AnsiTypeCoercion: return narrowest convertible type among TypeCollection
    • [SPARK-34614][SQL] ANSI mode: Casting String to Boolean will throw exception on parse error
    • [SPARK-33794][SQL] ANSI mode: Fix NextDay expression to throw runtime IllegalArgumentException when receiving invalid input under

Databricks Runtime 8.0 (Unsupported)

See Databricks Runtime 8.0 (Unsupported).

  • Sep 15, 2021
    • Fixed a race condition that might cause a query failure with an IOException like java.io.IOException: org.apache.spark.SparkException: Failed to get broadcast_x_piecey of broadcast_x.
  • Aug 25, 2021
    • Snowflake connector was upgraded to 2.9.0.
  • Aug 11, 2021
    • Fixes a bug that misconfigured AWS STS endpoints as Amazon Kinesis endpoints for the Kinesis source.
    • [SPARK-36034][SQL] Rebase datetime in pushed down filters to parquet.
  • Jul 29, 2021
    • [SPARK-36163][BUILD] Propagate correct JDBC properties in JDBC connector provider and add connectionProvider option
  • Jul 14, 2021
    • Fixed an issue when using column names with dots in Azure Synapse connector.
    • Fixed a bug that prevents users from time traveling to older available versions with Delta tables.
  • May 26, 2021
    • Updated Python with security patch to fix Python security vulnerability (CVE-2021-3177).
    • Delta cache is enabled by default on all GCP instances except those in the -highcpu- family. For -highcpu- instances, the cache is preconfigured but disabled by default. It can be enabled using the spark confing spark.databricks.io.cache.enabled true.
    • Enable Maven library installation.
  • Apr 30, 2021
    • Operating system security updates.
    • [SPARK-35227][BUILD] Update the resolver for spark-packages in SparkSubmit
    • [SPARK-34245][CORE] Ensure Master removes executors that failed to send finished state
  • Mar 24, 2021
    • [SPARK-34681][SQL] Fix bug for full outer shuffled hash join when building left side with non-equal condition
    • [SPARK-34534] Fix blockIds order when use FetchShuffleBlocks to fetch blocks
    • [SPARK-34613][SQL] Fix view does not capture disable hint config
  • Mar 9, 2021
    • [SPARK-34543][SQL] Respect the spark.sql.caseSensitive config while resolving partition spec in v1 SET LOCATION
    • [SPARK-34392][SQL] Support ZoneOffset +h:mm in DateTimeUtils. getZoneId
    • [UI] Fix the href link of Spark DAG Visualization
    • [SPARK-34436][SQL] DPP support LIKE ANY/ALL expression

Databricks Runtime 7.6 (Unsupported)

See Databricks Runtime 7.6 (Unsupported).

  • Aug 11, 2021
    • Fixes a bug that misconfigured AWS STS endpoints as Amazon Kinesis endpoints for the Kinesis source.
    • [SPARK-36034][SQL] Rebase datetime in pushed down filters to parquet.
  • Jul 29, 2021
    • [SPARK-32998][BUILD] Add ability to override default remote repos with internal repos only
  • Jul 14, 2021
    • Fixed a bug that prevents users from time traveling to older available versions with Delta tables.
  • May 26, 2021
    • Updated Python with security patch to fix Python security vulnerability (CVE-2021-3177).
    • Delta cache is enabled by default on all GCP instances except those in the -highcpu- family. For -highcpu- instances, the cache is preconfigured but disabled by default. It can be enabled using the spark confing spark.databricks.io.cache.enabled true.
    • Enable Maven library installation.
  • Apr 30, 2021
    • Operating system security updates.
    • [SPARK-35227][BUILD] Update the resolver for spark-packages in SparkSubmit
    • [SPARK-34245][CORE] Ensure Master removes executors that failed to send finished state
  • Mar 24, 2021
    • [SPARK-34768][SQL] Respect the default input buffer size in Univocity
    • [SPARK-34534] Fix blockIds order when use FetchShuffleBlocks to fetch blocks
  • Mar 9, 2021
    • (Azure only) Fixed an Auto Loader bug that can cause NullPointerException when using Databricks Runtime 7.6 to run an old Auto Loader stream created in Databricks Runtime 7.2
    • [UI] Fix the href link of Spark DAG Visualization
    • Unknown leaf-node SparkPlan is not handled correctly in SizeInBytesOnlyStatsSparkPlanVisitor
    • Restore the output schema of SHOW DATABASES
    • [Delta][8.0, 7.6] Fixed calculation bug in file size auto-tuning logic
    • Disable staleness check for Delta table files in Delta cache
    • [SQL] Use correct dynamic pruning build key when range join hint is present
    • Disable char type support in non-SQL code path
    • Avoid NPE in DataFrameReader.schema
    • Fix NPE when EventGridClient response has no entity
    • Fix a read closed stream bug in Azure Auto Loader
    • [SQL] Do not generate shuffle partition number advice when AOS is enabled
  • Feb 24, 2021
    • Upgraded the Spark BigQuery connector to v0.18, which introduces various bug fixes and support for Arrow and Avro iterators.
    • Fixed a correctness issue that caused Spark to return incorrect results when the Parquet file’s decimal precision and scale are different from the Spark schema.
    • Fixed reading failure issue on Microsoft SQL Server tables that contain spatial data types, by adding geometry and geography JDBC types support for Spark SQL.
    • Introduced a new configuration spark.databricks.hive.metastore.init.reloadFunctions.enabled. This configuration controls the built in Hive initialization. When set to true, Databricks reloads all functions from all databases that users have into FunctionRegistry. This is the default behavior in Hive Metastore. When set to false, Databricks disables this process for optimization.
    • [SPARK-34212] Fixed issues related to reading decimal data from Parquet files.
    • [SPARK-34260][SQL] Fix UnresolvedException when creating temp view twice.

Databricks Runtime 7.5 (Unsupported)

See Databricks Runtime 7.5 (Unsupported).

  • May 26, 2021
    • Updated Python with security patch to fix Python security vulnerability (CVE-2021-3177).
    • Delta cache is enabled by default on all GCP instances except those in the -highcpu- family. For -highcpu- instances, the cache is preconfigured but disabled by default. It can be enabled using the spark confing spark.databricks.io.cache.enabled true.
    • Enable Maven library installation.
  • Apr 30, 2021
    • Operating system security updates.
    • [SPARK-35227][BUILD] Update the resolver for spark-packages in SparkSubmit
    • [SPARK-34245][CORE] Ensure Master removes executors that failed to send finished state
  • Mar 24, 2021
    • [SPARK-34768][SQL] Respect the default input buffer size in Univocity
    • [SPARK-34534] Fix blockIds order when use FetchShuffleBlocks to fetch blocks
  • Mar 9, 2021
    • (Azure only) Fixed an Auto Loader bug that can cause NullPointerException when using Databricks Runtime 7.5 to run an old Auto Loader stream created in Databricks Runtime 7.2.
    • [UI] Fix the href link of Spark DAG Visualization
    • Unknown leaf-node SparkPlan is not handled correctly in SizeInBytesOnlyStatsSparkPlanVisitor
    • Restore the output schema of SHOW DATABASES
    • Disable staleness check for Delta table files in Delta cache
    • [SQL] Use correct dynamic pruning build key when range join hint is present
    • Disable char type support in non-SQL code path
    • Avoid NPE in DataFrameReader.schema
    • Fix NPE when EventGridClient response has no entity
    • Fix a read closed stream bug in Azure Auto Loader
  • Feb 24, 2021
    • Upgraded the Spark BigQuery connector to v0.18, which introduces various bug fixes and support for Arrow and Avro iterators.
    • Fixed a correctness issue that caused Spark to return incorrect results when the Parquet file’s decimal precision and scale are different from the Spark schema.
    • Fixed reading failure issue on Microsoft SQL Server tables that contain spatial data types, by adding geometry and geography JDBC types support for Spark SQL.
    • Introduced a new configuration spark.databricks.hive.metastore.init.reloadFunctions.enabled. This configuration controls the built in Hive initialization. When set to true, Databricks reloads all functions from all databases that users have into FunctionRegistry. This is the default behavior in Hive Metastore. When set to false, Databricks disables this process for optimization.
    • [SPARK-34212] Fixed issues related to reading decimal data from Parquet files.
    • [SPARK-34260][SQL] Fix UnresolvedException when creating temp view twice.
  • Feb 4, 2021
    • Fixed a regression that prevents the incremental execution of a query that sets a global limit such as SELECT * FROM table LIMIT nrows. The regression was experienced by users running queries via ODBC/JDBC with Arrow serialization enabled.
    • Introduced write time checks to the Hive client to prevent the corruption of metadata in the Hive metastore for Delta tables.
    • Fixed a regression that caused DBFS FUSE to fail to start when cluster environment variable configurations contain invalid bash syntax.
  • Jan 20, 2021
    • Fixed a regression in the Jan 12, 2021 maintenance release that can cause an incorrect AnalysisException and say the column is ambiguous in a self join. This regression happens when a user joins a DataFrame with its derived DataFrame (a so-called self-join) with the following conditions:
      • These two DataFrames have common columns, but the output of the self join does not have common columns. For example, df.join(df.select($"col" as "new_col"), cond)
      • The derived DataFrame excludes some columns via select, groupBy, or window.
      • The join condition or the following transformation after the joined Dataframe refers to the non-common columns. For example, df.join(df.drop("a"), df("a") === 1)
  • Jan 12, 2021
    • Upgrade Azure Storage SDK from 2.3.8 to 2.3.9.
    • [SPARK-33593][SQL] Vector reader got incorrect data with binary partition value
    • [SPARK-33480][SQL] updates the error message of char/varchar table insertion length check

Databricks Runtime 7.3 LTS

See Databricks Runtime 7.3 LTS.

  • Nov 4, 2021

    • Fixed a bug that could cause Structured Streaming streams to fail with an ArrayIndexOutOfBoundsException
    • Fixed a race condition that might cause a query failure with an IOException like java.io.IOException: No FileSystem for scheme or that might cause modifications to sparkContext.hadoopConfiguration to not take effect in queries.
  • Sep 15, 2021

    • Fixed a race condition that might cause a query failure with an IOException like java.io.IOException: org.apache.spark.SparkException: Failed to get broadcast_x_piecey of broadcast_x.
    • Operating system security updates.
  • Sep 8, 2021

    • [SPARK-35700][SQL][WARMFIX] Read char/varchar orc table when created and written by external systems.
    • [SPARK-36532][CORE][3.1] Fixed deadlock in ``CoarseGrainedExecutorBackend.onDisconnected` to avoid executorsconnected to prevent executor shutdown hang.
  • Aug 25, 2021

    • Snowflake connector was upgraded to 2.9.0.
  • Jul 29, 2021

    • [SPARK-36034][BUILD] Rebase datetime in pushed down filters to Parquet
    • [SPARK-34508][BUILD] Skip HiveExternalCatalogVersionsSuite if network is down
  • Jul 14, 2021

    • Introduced database.schema.table format for Azure Synapse connector.
    • Added support to provide databaseName.schemaName.tableName format as the target table instead of only schemaName.tableName or tableName.
    • Fixed a bug that prevents users from time traveling to older available versions with Delta tables.
  • Jun 15, 2021

    • Fixes a NoSuchElementException bug in Delta Lake optimized writes that can happen when writing large amounts of data and encountering executor losses
    • Updated Python with security patch to fix Python security vulnerability (CVE-2021-3177).
    • Delta cache is enabled by default on all GCP instances except those in the -highcpu- family. For -highcpu- instances, the cache is preconfigured but disabled by default. It can be enabled using the spark confing spark.databricks.io.cache.enabled true.
  • Apr 30, 2021

    • Operating system security updates.
    • [SPARK-35227][BUILD] Update the resolver for spark-packages in SparkSubmit
    • [SPARK-34245][CORE] Ensure Master removes executors that failed to send finished state
    • [SPARK-35045][SQL] Add an internal option to control input buffer in univocity
  • Mar 24, 2021

    • [SPARK-34768][SQL] Respect the default input buffer size in Univocity
    • [SPARK-34534] Fix blockIds order when use FetchShuffleBlocks to fetch blocks
    • [SPARK-33118][SQL]CREATE TEMPORARY TABLE fails with location
  • Mar 9, 2021

    • The updated Azure Blob File System driver for Azure Data Lake Storage Gen2 is now enabled by default. It brings multiple stability improvements.
    • Fix path separator on Windows for databricks-connect get-jar-dir
    • [UI] Fix the href link of Spark DAG Visualization
    • [DBCONNECT] Add support for FlatMapCoGroupsInPandas in Databricks Connect 7.3
    • Restore the output schema of SHOW DATABASES
    • [SQL] Use correct dynamic pruning build key when range join hint is present
    • Disable staleness check for Delta table files in Delta cache
    • [SQL] Do not generate shuffle partition number advice when AOS is enable
  • Feb 24, 2021

    • Upgraded the Spark BigQuery connector to v0.18, which introduces various bug fixes and support for Arrow and Avro iterators.
    • Fixed a correctness issue that caused Spark to return incorrect results when the Parquet file’s decimal precision and scale are different from the Spark schema.
    • Fixed reading failure issue on Microsoft SQL Server tables that contain spatial data types, by adding geometry and geography JDBC types support for Spark SQL.
    • Introduced a new configuration spark.databricks.hive.metastore.init.reloadFunctions.enabled. This configuration controls the built in Hive initialization. When set to true, Databricks reloads all functions from all databases that users have into FunctionRegistry. This is the default behavior in Hive Metastore. When set to false, Databricks disables this process for optimization.
    • [SPARK-34212] Fixed issues related to reading decimal data from Parquet files.
    • [SPARK-33579][UI] Fix executor blank page behind proxy.
    • [SPARK-20044][UI] Support Spark UI behind front-end reverse proxy using a path prefix.
    • [SPARK-33277][PYSPARK][SQL] Use ContextAwareIterator to stop consuming after the task ends.
  • Feb 4, 2021

    • Fixed a regression that prevents the incremental execution of a query that sets a global limit such as SELECT * FROM table LIMIT nrows. The regression was experienced by users running queries via ODBC/JDBC with Arrow serialization enabled.
    • Fixed a regression that caused DBFS FUSE to fail to start when cluster environment variable configurations contain invalid bash syntax.
  • Jan 20, 2021

    • Fixed a regression in the Jan 12, 2021 maintenance release that can cause an incorrect AnalysisException and say the column is ambiguous in a self join. This regression happens when a user joins a DataFrame with its derived DataFrame (a so-called self-join) with the following conditions:
      • These two DataFrames have common columns, but the output of the self join does not have common columns. For example, df.join(df.select($"col" as "new_col"), cond)
      • The derived DataFrame excludes some columns via select, groupBy, or window.
      • The join condition or the following transformation after the joined Dataframe refers to the non-common columns. For example, df.join(df.drop("a"), df("a") === 1)
  • Jan 12, 2021

    • Operating system security updates.
    • [SPARK-33593][SQL] Vector reader got incorrect data with binary partition value
    • [SPARK-33677][SQL] Skip LikeSimplification rule if pattern contains any escapeChar
    • [SPARK-33592][ML][PYTHON] Pyspark ML Validator params in estimatorParamMaps may be lost after saving and reloading
    • [SPARK-33071][SPARK-33536][SQL] Avoid changing dataset_id of LogicalPlan in join() to not break DetectAmbiguousSelfJoin
  • Dec 8, 2020

    • [SPARK-33587][CORE] Kill the executor on nested fatal errors
    • [SPARK-27421][SQL] Fix filter for int column and value class java.lang.String when pruning partition column
    • [SPARK-33316][SQL] Support user provided nullable Avro schema for non-nullable catalyst schema in Avro writing
    • Spark Jobs launched using Databricks Connect could hang indefinitely with Executor$TaskRunner.$anonfun$copySessionState in executor stack trace
    • Operating system security updates.
  • Nov 20, 2020
    • [SPARK-33404][SQL][3.0] Fix incorrect results in date_trunc expression
    • [SPARK-33339][PYTHON] Pyspark application will hang due to non Exception error
    • [SPARK-33183][SQL][HOTFIX] Fix Optimizer rule EliminateSorts and add a physical rule to remove redundant sorts
    • [SPARK-33371][PYTHON][3.0] Update setup.py and tests for Python 3.9
    • [SPARK-33391][SQL] element_at with CreateArray not respect one based index.
    • [SPARK-33306][SQL]Timezone is needed when cast date to string
    • [SPARK-33260][SQL] Fix incorrect results from SortExec when sortOrder is Stream
  • Nov 5, 2020
    • Fix ABFS and WASB locking with regard to UserGroupInformation.getCurrentUser().
    • Fix an infinite loop bug when Avro reader reads the MAGIC bytes.
    • Add support for the USAGE privilege.
    • Performance improvements for privilege checking in table access control.
  • Oct 13, 2020
    • Operating system security updates.
    • You can read and write from DBFS using the FUSE mount at /dbfs/ when on a high concurrency credential passthrough enabled cluster. Regular mounts are supported but mounts that need passthrough credentials are not supported yet.
    • [SPARK-32999][SQL] Use Utils.getSimpleName to avoid hitting Malformed class name in TreeNode
    • [SPARK-32585][SQL] Support scala enumeration in ScalaReflection
    • Fixed listing directories in FUSE mount that contain file names with invalid XML characters
    • FUSE mount no longer uses ListMultipartUploads
  • Sep 29, 2020
    • [SPARK-32718][SQL] Remove unnecessary keywords for interval units
    • [SPARK-32635][SQL] Fix foldable propagation
    • Add a new config spark.shuffle.io.decoder.consolidateThreshold. Set the config value to Long.MAX_VALUE to skip the consolidation of netty FrameBuffers, which prevents java.lang.IndexOutOfBoundsException in corner cases.