Maintenance updates for Databricks Runtime (archived)

This archived page lists maintenance updates issued for Databricks Runtime releases that are no longer supported. To add a maintenance update to an existing cluster, restart the cluster.

To migrate to a supported Databricks Runtime version, see the Databricks Runtime migration guide.

Important

This documentation has been retired and might not be updated. The products, services, or technologies mentioned in this content are no longer supported. See Databricks Runtime release notes versions and compatibility.

Note

This list of maintenance updates may include references to features that are not available on Google Cloud.

Databricks Runtime releases

For the maintenance updates on supported Databricks Runtime versions, see Databricks Runtime maintenance updates.

Databricks Runtime 14.0

See Databricks Runtime 14.0 (unsupported).

  • February 8, 2024

    • [SPARK-46396] Timestamp inference should not throw exception.

    • [SPARK-46794] Remove subqueries from LogicalRDD constraints.

    • [SPARK-45182] Ignore task completion from old stage after retrying parent-indeterminate stage as determined by checksum.

    • [SPARK-46933] Add query execution time metric to connectors which use JDBCRDD.

    • [SPARK-45957] Avoid generating execution plan for non-executable commands.

    • [SPARK-46861] Avoid Deadlock in DAGScheduler.

    • [SPARK-46930] Add support for a custom prefix for Union type fields in Avro.

    • [SPARK-46941] Can’t insert window group limit node for top-k computation if contains SizeBasedWindowFunction.

    • [SPARK-45582] Ensure that store instance is not used after calling commit within output mode streaming aggregation.

    • Operating system security updates.

  • January 31, 2024

    • [SPARK-46541] Fix the ambiguous column reference in self join.

    • [SPARK-46676] dropDuplicatesWithinWatermark should not fail on canonicalization of the plan.

    • [SPARK-46769] Refine timestamp related schema inference.

    • [SPARK-45498] Followup: Ignore task completion from old stage attempts.

    • Revert [SPARK-46769] Refine timestamp related schema inference.

    • [SPARK-46383] Reduce Driver Heap Usage by Reducing the Lifespan of TaskInfo.accumulables().

    • [SPARK-46633] Fix Avro reader to handle zero-length blocks.

    • [SPARK-46677] Fix dataframe["*"] resolution.

    • [SPARK-46684] Fix CoGroup.applyInPandas/Arrow to pass arguments properly.

    • [SPARK-46763] Fix assertion failure in ReplaceDeduplicateWithAggregate for duplicate attributes.

    • [SPARK-46610] Create table should throw exception when no value for a key in options.

    • Operating system security updates.

  • January 17, 2024

    • The shuffle node of the explain plan returned by a Photon query is updated to add the causedBroadcastJoinBuildOOM=true flag when an out-of-memory error occurs during a shuffle that is part of a broadcast join.

    • To avoid increased latency when communicating over TLSv1.3, this maintenance release includes a patch to the JDK 8 installation to fix JDK bug JDK-8293562.

    • [SPARK-46394] Fix spark.catalog.listDatabases() issues on schemas with special characters when spark.sql.legacy.keepCommandOutputSchema set to true.

    • [SPARK-46250] Deflake test_parity_listener.

    • [SPARK-45814] Make ArrowConverters.createEmptyArrowBatch call close() to avoid memory leak.

    • [SPARK-46173] Skipping trimAll call during date parsing.

    • [SPARK-46484] Make resolveOperators helper functions keep the plan id.

    • [SPARK-46466] Vectorized parquet reader should never do rebase for timestamp ntz.

    • [SPARK-46056] Fix Parquet vectorized read NPE with byteArrayDecimalType default value.

    • [SPARK-46058] Add separate flag for privateKeyPassword.

    • [SPARK-46478] Revert SPARK-43049 to use oracle varchar(255) for string.

    • [SPARK-46132] Support key password for JKS keys for RPC SSL.

    • [SPARK-46417] Do not fail when calling hive.getTable and throwException is false.

    • [SPARK-46261] DataFrame.withColumnsRenamed should keep the dict/map ordering.

    • [SPARK-46370] Fix bug when querying from table after changing column defaults.

    • [SPARK-46609] Avoid exponential explosion in PartitioningPreservingUnaryExecNode.

    • [SPARK-46600] Move shared code between SqlConf and SqlApiConf to SqlApiConfHelper.

    • [SPARK-46538] Fix the ambiguous column reference issue in ALSModel.transform.

    • [SPARK-46337] Make CTESubstitution retain the PLAN_ID_TAG.

    • [SPARK-46602] Propagate allowExisting in view creation when the view/table does not exists.

    • [SPARK-46260] DataFrame.withColumnsRenamed should respect the dict ordering.

    • [SPARK-46145] spark.catalog.listTables does not throw exception when the table or view is not found.

  • December 14, 2023

    • Fixed an issue where escaped underscores in getColumns operations originating from JDBC or ODBC clients were handled incorrectly and interpreted as wildcards.

    • [SPARK-46255] Support complex type -> string conversion.

    • [SPARK-46028] Make Column.__getitem__ accept input column.

    • [SPARK-45920] group by ordinal should be idempotent.

    • [SPARK-45433] Fix CSV/JSON schema inference when timestamps do not match specified timestampFormat.

    • [SPARK-45509] Fix df column reference behavior for Spark Connect.

    • Operating system security updates.

  • November 29, 2023

    • Installed a new package, pyarrow-hotfix to remediate a PyArrow RCE vulnerability.

    • Fixed an issue where escaped underscores in getColumns operations originating from JDBC or ODBC clients were wrongly interpreted as wildcards.

    • When ingesting CSV data using Auto Loader or Streaming Tables, large CSV files are now splittable and can be processed in parallel during both schema inference and data processing.

    • Spark-snowflake connector is upgraded to 2.12.0.

    • [SPARK-45859] Made UDF objects in ml.functions lazy.

    • Revert [SPARK-45592].

    • [SPARK-45892] Refactor optimizer plan validation to decouple validateSchemaOutput and validateExprIdUniqueness.

    • [SPARK-45592] Fixed correctness issue in AQE with InMemoryTableScanExec.

    • [SPARK-45620] APIs related to Python UDF now use camelCase.

    • [SPARK-44784] Made SBT testing hermetic.

    • [SPARK-45770] Fixed column resolution with DataFrameDropColumns for Dataframe.drop.

    • [SPARK-45544] Integrated SSL support into TransportContext.

    • [SPARK-45730] Improved time constraints for ReloadingX509TrustManagerSuite.

    • Operating system security updates.

  • November 10, 2023

    • Changed data feed queries on Unity Catalog Streaming Tables and Materialized Views to display error messages.

    • [SPARK-45545] SparkTransportConf inherits SSLOptions upon creation.

    • [SPARK-45584] Fixed subquery run failure with TakeOrderedAndProjectExec.

    • [SPARK-45427] Added RPC SSL settings to SSLOptions and SparkTransportConf.

    • [SPARK-45541] Added SSLFactory.

    • [SPARK-45430] FramelessOffsetWindowFunction no longer fails when IGNORE NULLS and offset > rowCount.

    • [SPARK-45429] Added helper classes for SSL RPC communication.

    • [SPARK-44219] Added extra per-rule validations for optimization rewrites.

    • [SPARK-45543] Fixed an issue where InferWindowGroupLimit generated an error if the other window functions haven’t the same window frame as the rank-like functions.

    • Operating system security updates.

  • October 23, 2023

    • [SPARK-45426] Added support for ReloadingX509TrustManager.

    • [SPARK-45396] Added doc entry for PySpark.ml.connect module, and added Evaluator to __all__ at ml.connect.

    • [SPARK-45256] Fixed an issue where DurationWriter failed when writing more values than initial capacity.

    • [SPARK-45279] Attached plan_id to all logical plans.

    • [SPARK-45250] Added support for stage-level task resource profile for yarn clusters when dynamic allocation is turned off.

    • [SPARK-45182] Added support for rolling back shuffle map stage so all stage tasks can be retried when the stage output is indeterminate.

    • [SPARK-45419] Avoid reusing rocksdb sst files in a different rocksdb instance by removing file version map entries of larger versions.

    • [SPARK-45386] Fixed an issue where StorageLevel.NONE would incorrectly return 0.

    • Operating system security updates.

  • October 13, 2023

    • Snowflake-jdbc dependency upgraded from 3.13.29 to 3.13.33.

    • The array_insert function is 1-based for positive and negative indexes, while before, it was 0-based for negative indexes. It now inserts a new element at the end of input arrays for the index -1. To restore the previous behavior, set spark.sql.legacy.negativeIndexInArrayInsert to true.

    • Databricks no longer ignores corrupt files when a CSV schema inference with Auto Loader has enabled ignoreCorruptFiles.

    • [SPARK-45227] Fixed a subtle thread-safety issue with CoarseGrainedExecutorBackend.

    • [SPARK-44658] ShuffleStatus.getMapStatus should return None instead of Some(null).

    • [SPARK-44910] Encoders.bean does not support superclasses with generic type arguments.

    • [SPARK-45346] Parquet schema inference respects case-sensitive flags when merging schema.

    • Revert [SPARK-42946].

    • [SPARK-42205] Updated the JSON protocol to remove Accumulables logging in a task or stage start events.

    • [SPARK-45360] Spark session builder supports initialization from SPARK_REMOTE.

    • [SPARK-45316] Add new parameters ignoreCorruptFiles/ignoreMissingFiles to HadoopRDD and NewHadoopRDD.

    • [SPARK-44909] Skip running the torch distributor log streaming server when it is not available.

    • [SPARK-45084] StateOperatorProgress now uses accurate shuffle partition number.

    • [SPARK-45371] Fixed shading issues in the Spark Connect Scala Client.

    • [SPARK-45178] Fallback to running a single batch for Trigger.AvailableNow with unsupported sources rather than using the wrapper.

    • [SPARK-44840] Make array_insert() 1-based for negative indexes.

    • [SPARK-44551] Edited comments to sync with OSS.

    • [SPARK-45078] The ArrayInsert function now makes explicit casting when the element type does not equal the derived component type.

    • [SPARK-45339] PySpark now logs retry errors.

    • [SPARK-45057] Avoid acquiring read lock when keepReadLock is false.

    • [SPARK-44908] Fixed cross-validator foldCol param functionality.

    • Operating system security updates.

Databricks Runtime 13.1

See Databricks Runtime 13.1 (unsupported).

  • November 29, 2023

    • Fixed an issue where escaped underscores in getColumns operations originating from JDBC or ODBC clients were wrongly interpreted as wildcards.

    • [SPARK-44846] Removed complex grouping expressions after RemoveRedundantAggregates.

    • [SPARK-43802] Fixed an issue where codegen for unhex and unbase64 expressions would fail.

    • [SPARK-43718] Fixed nullability for keys in USING joins.

    • Operating system security updates.

  • November 14, 2023

    • Partition filters on Delta Lake streaming queries are pushed down before rate limiting to achieve better utilization.

    • Changed data feed queries on Unity Catalog Streaming Tables and Materialized Views to display error messages.

    • [SPARK-45584] Fixed subquery run failure with TakeOrderedAndProjectExec.

    • [SPARK-45430] FramelessOffsetWindowFunction no longer fails when IGNORE NULLS and offset > rowCount.

    • [SPARK-45543] Fixed an issue where InferWindowGroupLimit caused an issue if the other window functions didn’t have the same window frame as the rank-like functions.

    • Operating system security updates.

  • October 24, 2023

    • [SPARK-43799] Added descriptor binary option to PySpark Protobuf API.

    • Revert [SPARK-42946].

    • [SPARK-45346] Parquet schema inference now respects case-sensitive flag when merging a schema.

    • Operating system security updates.

  • October 13, 2023

    • Snowflake-jdbc dependency upgraded from 3.13.29 to 3.13.33.

    • No longer ignoring corrupt files when ignoreCorruptFiles is enabled during CSV schema inference with Auto Loader.

    • [SPARK-44658] ShuffleStatus.getMapStatus returns None instead of Some(null).

    • [SPARK-45178] Fallback to running a single batch for Trigger.AvailableNow with unsupported sources rather than using the wrapper.

    • [SPARK-42205] Updated the JSON protocol to remove Accumulables logging in a task or stage start events.

    • Operating system security updates.

  • September 12, 2023

    • [SPARK-44718] Match ColumnVector memory-mode config default to OffHeapMemoryMode config value.

    • SPARK-44878 Turned off strict limit for RocksDB write manager to avoid insertion exception on cache complete.

    • Miscellaneous fixes.

  • August 30, 2023

    • [SPARK-44871] Fixed `percentile_disc behavior.

    • [SPARK-44714] Ease restriction of LCA resolution regarding queries.

    • [SPARK-44245] PySpark.sql.dataframe sample() doc tests are now illustrative-only.

    • [SPARK-44818] Fixed race for pending task interrupt issued before taskThread is initialized.

    • Operating system security updates.

  • August 15, 2023

    • [SPARK-44485] Optimized TreeNode.generateTreeString.

    • [SPARK-44643] Fixed Row.__repr__ when the row is empty.

    • [SPARK-44504] Maintenance task now cleans up loaded providers on stop error.

    • [SPARK-44479] Fixed protobuf conversion from an empty struct type.

    • [SPARK-44464] Fixed applyInPandasWithStatePythonRunner to output rows that have Null as the first column value.

    • Miscellaneous fixes.

  • July 27, 2023

    • Fixed an issue where dbutils.fs.ls() returned INVALID_PARAMETER_VALUE.LOCATION_OVERLAP when called for a storage location path which clashed with other external or managed storage location.

    • [SPARK-44199] CacheManager no longer refreshes the fileIndex unnecessarily.

    • [SPARK-44448] Fixed wrong results bug from DenseRankLimitIterator and InferWindowGroupLimit.

    • Operating system security updates.

  • July 24, 2023

    • Revert [SPARK-42323].

    • [SPARK-41848] Fixed task over-schedule issue with TaskResourceProfile.

    • [SPARK-44136] Fixed an issue where StateManager would get materialized in an executor instead of the driver in FlatMapGroupsWithStateExec.

    • [SPARK-44337] Fixed an issue where any field set to Any.getDefaultInstance caused parse errors.

    • Operating system security updates.

  • June 27, 2023

    • Operating system security updates.

  • June 15, 2023

    • Photonized approx_count_distinct.

    • JSON parser in failOnUnknownFields mode now drops the record in DROPMALFORMED mode and fails directly in FAILFAST mode.

    • Snowflake-jdbc library is upgraded to 3.13.29 to address a security issue.

    • The PubSubRecord attributes field is stored as JSON instead of the string from a Scala map for more straightforward serialization and deserialization.

    • The EXPLAIN EXTENDED command now returns the result cache eligibility of the query.

    • Improve the performance of incremental updates with SHALLOW CLONE Iceberg and Parquet.

    • [SPARK-43032] Python SQM bug fix.

    • [SPARK-43404]Skip reusing the sst file for the same version of RocksDB state store to avoid the ID mismatch error.

    • [SPARK-43340] Handle missing stack-trace field in eventlogs.

    • [SPARK-43527] Fixed catalog.listCatalogs in PySpark.

    • [SPARK-43541] Propagate all Project tags in resolving of expressions and missing columns.

    • [SPARK-43300] NonFateSharingCache wrapper for Guava Cache.

    • [SPARK-43378] Properly close stream objects in deserializeFromChunkedBuffer.

    • [SPARK-42852] Revert NamedLambdaVariable related changes from EquivalentExpressions.

    • [SPARK-43779] ParseToDate now loads EvalMode in the main thread.

    • [SPARK-43413] Fix IN subquery ListQuery nullability.

    • [SPARK-43889] Add check for column name for __dir__() to filter out error-prone column names.

    • [SPARK-43043] Improved the performance of MapOutputTracker.updateMapOutput

    • [SPARK-43522] Fixed creating struct column name with index of array.

    • [SPARK-43457] Augument user agent with OS, Python and Spark versions.

    • [SPARK-43286] Updated aes_encrypt CBC mode to generate random IVs.

    • [SPARK-42851] Guard EquivalentExpressions.addExpr() with supportedExpression().

    • Revert [SPARK-43183].

    • Operating system security updates.

Databricks Runtime 12.2 LTS

See Databricks Runtime 12.2 LTS.

  • November 29, 2023

    • Fixed an issue where escaped underscores in getColumns operations originating from JDBC or ODBC clients were wrongly interpreted as wildcards.

    • [SPARK-42205] Removed logging accumulables in Stage and Task start events.

    • [SPARK-44846] Removed complex grouping expressions after RemoveRedundantAggregates.

    • [SPARK-43718] Fixed nullability for keys in USING joins.

    • [SPARK-45544] Integrated SSL support into TransportContext.

    • [SPARK-43973] Structured Streaming UI now displays failed queries correctly.

    • [SPARK-45730] Improved time constraints for ReloadingX509TrustManagerSuite.

    • [SPARK-45859] Made UDF objects in ml.functions lazy.

    • Operating system security updates.

  • November 14, 2023

    • Partition filters on Delta Lake streaming queries are pushed down before rate limiting to achieve better utilization.

    • [SPARK-45545] SparkTransportConf inherits SSLOptions upon creation.

    • [SPARK-45427] Added RPC SSL settings to SSLOptions and SparkTransportConf.

    • [SPARK-45584] Fixed subquery run failure with TakeOrderedAndProjectExec.

    • [SPARK-45541] Added SSLFactory.

    • [SPARK-45430] FramelessOffsetWindowFunction no longer fails when IGNORE NULLS and offset > rowCount.

    • [SPARK-45429] Added helper classes for SSL RPC communication.

    • Operating system security updates.

  • October 24, 2023

    • [SPARK-45426] Added support for ReloadingX509TrustManager.

    • Miscellaneous fixes.

  • October 13, 2023

    • Snowflake-jdbc dependency upgraded from 3.13.29 to 3.13.33.

    • [SPARK-42553] Ensure at least one time unit after interval.

    • [SPARK-45346] Parquet schema inference respects case sensitive flag when merging schema.

    • [SPARK-45178] Fallback to running a single batch for Trigger.AvailableNow with unsupported sources rather than using the wrapper.

    • [SPARK-45084] StateOperatorProgress to use an accurate, adequate shuffle partition number.

  • September 12, 2023

    • [SPARK-44873] Added support for alter view with nested columns in the Hive client.

    • [SPARK-44718] Match ColumnVector memory-mode config default to OffHeapMemoryMode config value.

    • [SPARK-43799] Added descriptor binary option to PySpark Protobuf API.

    • Miscellaneous fixes.

  • August 30, 2023

  • August 15, 2023

    • [SPARK-44504] Maintenance task cleans up loaded providers on stop error.

    • [SPARK-44464] Fixed applyInPandasWithStatePythonRunner to output rows that have Null as the first column value.

    • Operating system security updates.

  • July 29, 2023

    • Fixed an issue where dbutils.fs.ls() returned INVALID_PARAMETER_VALUE.LOCATION_OVERLAP when called for a storage location path which clashed with other external or managed storage location.

    • [SPARK-44199] CacheManager no longer refreshes the fileIndex unnecessarily.

    • Operating system security updates.

  • July 24, 2023

    • [SPARK-44337] Fixed an issue where any field set to Any.getDefaultInstance caused parse errors.

    • [SPARK-44136] Fixed an issue where StateManager would get materialized in an executor instead of the driver in FlatMapGroupsWithStateExec.

    • Operating system security updates.

  • June 23, 2023

    • Operating system security updates.

  • June 15, 2023

    • Photonized approx_count_distinct.

    • Snowflake-jdbc library is upgraded to 3.13.29 to address a security issue.

    • [SPARK-43779] ParseToDate now loads EvalMode in the main thread.

    • [SPARK-43156][SPARK-43098] Extended scalar subquery count error test with decorrelateInnerQuery turned off.

    • Operating system security updates.

  • June 2, 2023

    • The JSON parser in failOnUnknownFields mode drops a record in DROPMALFORMED mode and fails directly in FAILFAST mode.

    • Improve the performance of incremental updates with SHALLOW CLONE Iceberg and Parquet.

    • Fixed an issue in Auto Loader where different source file formats were inconsistent when the provided schema did not include inferred partitions. This issue could cause unexpected failures when reading files with missing columns in the inferred partition schema.

    • [SPARK-43404] Skip reusing the sst file for the same version of RocksDB state store to avoid the ID mismatch error.

    • [SPARK-43413][11.3-13.0] Fixed IN subquery ListQuery nullability.

    • [SPARK-43522] Fixed creating struct column name with index of array.

    • [SPARK-43541] Propagate all Project tags in resolving of expressions and missing columns.

    • [SPARK-43527] Fixed catalog.listCatalogs in PySpark.

    • [SPARK-43123] Internal field metadata no longer leaks to catalogs.

    • [SPARK-43340] Fixed missing stack trace field in eventlogs.

    • [SPARK-42444] DataFrame.drop now handles duplicated columns correctly.

    • [SPARK-42937] PlanSubqueries now sets InSubqueryExec#shouldBroadcast to true.

    • [SPARK-43286] Updated aes_encrypt CBC mode to generate random IVs.

    • [SPARK-43378] Properly close stream objects in deserializeFromChunkedBuffer.

  • May 17, 2023

    • Parquet scans are now robust against OOMs when scanning exceptionally structured files by dynamically adjusting batch size. File metadata is analyzed to preemptively lower batch size and is lowered again on task retries as a final safety net.

    • If an Avro file was read with just the failOnUnknownFields\ option or with Auto Loader in the failOnNewColumns\ schema evolution mode, columns that have different data types would be read as null\ instead of throwing an error stating that the file cannot be read. These reads now fail and recommend users to use the rescuedDataColumn\ option.

    • Auto Loader now does the following.

      • Correctly reads and no longer rescues Integer, Short, and Byte types if one of these data types is provided, but the Avro file suggests one of the other two types.

      • Prevents reading interval types as date or time stamp types to avoid getting corrupt dates.

      • Prevents reading Decimal types with lower precision.

    • [SPARK-43172] Exposes host and token from Spark connect client.

    • [SPARK-43293] __qualified_access_only is ignored in normal columns.

    • [SPARK-43098] Fixed correctness COUNT bug when scalar subquery is grouped by clause.

    • [SPARK-43085] Support for column DEFAULT assignment for multi-part table names.

    • [SPARK-43190] ListQuery.childOutput is now consistent with secondary output.

    • [SPARK-43192] Removed user agent charset validation.

    • Operating system security updates.

  • April 25, 2023

    • If a Parquet file was read with just the failOnUnknownFields option or with Auto Loader in the failOnNewColumns schema evolution mode, columns that had different data types would be read as null instead of throwing an error stating that the file cannot be read. These reads now fail and recommend users to use the rescuedDataColumn option.

    • Auto Loader now correctly reads and no longer rescues Integer, Short, and Byte types if one of these data types is provided. The Parquet file suggests one of the other two types. When the rescued data column was previously enabled, the data type mismatch would cause columns to be saved even though they were readable.

    • [SPARK-43009] Parameterized sql() with Any constants

    • [SPARK-42406] Terminate Protobuf recursive fields by dropping the field

    • [SPARK-43038] Support the CBC mode by aes_encrypt()/aes_decrypt()

    • [SPARK-42971] Change to print workdir if appDirs is null when worker handle WorkDirCleanup event

    • [SPARK-43018] Fix bug for INSERT commands with timestamp literals

    • Operating system security updates.

  • April 11, 2023

    • Support legacy data source formats in the SYNC command.

    • Fixes an issue in the %autoreload behavior in notebooks outside of a repo.

    • Fixed an issue where Auto Loader schema evolution can go into an infinite fail loop when a new column is detected in the schema of a nested JSON object.

    • [SPARK-42928] Makes resolvePersistentFunction synchronized.

    • [SPARK-42936] Fixes LCan issue when the clause can be resolved directly by its child aggregate.

    • [SPARK-42967] Fixes SparkListenerTaskStart.stageAttemptId when a task starts after the stage is canceled.

    • Operating system security updates.

  • March 29, 2023

    • Databricks SQL now supports specifying default values for columns of Delta Lake tables, either at table creation time or afterward. Subsequent INSERT, UPDATE, DELETE, and MERGE commands can refer to any column’s default value using the explicit DEFAULT keyword. In addition, if any INSERT assignment has an explicit list of fewer columns than the target table, corresponding column default values are substituted for the remaining columns (or NULL if no default is specified).

      For example:

      CREATE TABLE t (first INT, second DATE DEFAULT CURRENT_DATE()) USING delta;
      INSERT INTO t VALUES (0, DEFAULT);
      INSERT INTO t VALUES (1, DEFAULT);
      SELECT first, second FROM t;
      \> 0, 2023-03-28
      1, 2023-03-28z
      
    • Auto Loader now initiates at least one synchronous RocksDB log cleanup for Trigger.AvailableNow streams to check that the checkpoint can get regularly cleaned up for fast-running Auto Loader streams. This can cause some streams to take longer before they shut down, but it will save you storage costs and improve the Auto Loader experience in future runs.

    • You can now modify a Delta table to add support to table features using DeltaTable.addFeatureSupport(feature_name).

    • [SPARK-42794] Increase the lockAcquireTimeoutMs to 2 minutes for acquiring the RocksDB state store in Structure Streaming

    • [SPARK-42521] Add NULLs for INSERTs with user-specified lists of fewer columns than the target table

    • [SPARK-42702][SPARK-42623] Support parameterized query in subquery and CTE

    • [SPARK-42668] Catch exception while trying to close the compressed stream in HDFSStateStoreProvider stop

    • [SPARK-42403] JsonProtocol should handle null JSON strings

  • March 8, 2023

    • The error message “Failure to initialize configuration” has been improved to provide more context for the customer.

    • There is a terminology change for adding features to a Delta table using the table property. The preferred syntax is now 'delta.feature.featureName'='supported' instead of 'delta.feature.featureName'='enabled'. For backward compatibility, using 'delta.feature.featureName'='enabled' still works and will continue to work.

    • Starting from this release, it is possible to create/replace a table with an additional table property delta.ignoreProtocolDefaults to ignore protocol-related Spark configs, which includes default reader and writer versions and table features supported by default.

    • [SPARK-42070] Change the default value of the argument of the Mask function from -1 to NULL

    • [SPARK-41793] Incorrect result for window frames defined by a range clause on significant decimals

    • [SPARK-42484] UnsafeRowUtils better error message

    • [SPARK-42516] Always capture the session time zone config while creating views

    • [SPARK-42635] Fix the TimestampAdd expression.

    • [SPARK-42622] Turned off substitution in values

    • [SPARK-42534] Fix DB2Dialect Limit clause

    • [SPARK-42121] Add built-in table-valued functions posexplode, posexplode_outer, json_tuple and stack

    • [SPARK-42045] ANSI SQL mode: Round/Bround should return an error on tiny/small/significant integer overflow

    • Operating system security updates.

Databricks Runtime 11.3 LTS

See Databricks Runtime 11.3 LTS.

  • November 29, 2023

    • Fixed an issue where escaped underscores in getColumns operations originating from JDBC or ODBC clients were wrongly interpreted as wildcards.

    • [SPARK-43973] Structured Streaming UI now displays failed queries correctly.

    • [SPARK-45730] Improved time constraints for ReloadingX509TrustManagerSuite.

    • [SPARK-45544] Integrated SSL support into TransportContext.

    • [SPARK-45859] Made UDF objects in ml.functions lazy.

    • [SPARK-43718] Fixed nullability for keys in USING joins.

    • [SPARK-44846] Removed complex grouping expressions after RemoveRedundantAggregates.

    • Operating system security updates.

  • November 14, 2023

    • Partition filters on Delta Lake streaming queries are pushed down before rate limiting to achieve better utilization.

    • [SPARK-42205] Removed logging accumulables in Stage and Task start events.

    • [SPARK-45545] SparkTransportConf inherits SSLOptions upon creation.

    • Revert [SPARK-33861].

    • [SPARK-45541] Added SSLFactory.

    • [SPARK-45429] Added helper classes for SSL RPC communication.

    • [SPARK-45584] Fixed subquery run failure with TakeOrderedAndProjectExec.

    • [SPARK-45430] FramelessOffsetWindowFunction no longer fails when IGNORE NULLS and offset > rowCount.

    • [SPARK-45427] Added RPC SSL settings to SSLOptions and SparkTransportConf.

    • Operating system security updates.

  • October 24, 2023

    • [SPARK-45426] Added support for ReloadingX509TrustManager.

    • Miscellaneous fixes.

  • October 13, 2023

    • Snowflake-jdbc dependency upgraded from 3.13.29 to 3.13.33.

    • [SPARK-45178] Fallback to running a single batch for Trigger.AvailableNow with unsupported sources rather than using the wrapper.

    • [SPARK-45084] StateOperatorProgress to use an accurate, adequate shuffle partition number.

    • [SPARK-45346] Parquet schema inference now respects case-sensitive flag when merging a schema.

    • Operating system security updates.

  • September 10, 2023

    • Miscellaneous fixes.

  • August 30, 2023

    • [SPARK-44818] Fixed race for pending task interrupt issued before taskThread is initialized.

    • [SPARK-44871][11.3-13.0] Fixed percentile_disc behavior.

    • Operating system security updates.

  • August 15, 2023

    • [SPARK-44485] Optimized TreeNode.generateTreeString.

    • [SPARK-44504] Maintenance task cleans up loaded providers on stop error.

    • [SPARK-44464] Fixed applyInPandasWithStatePythonRunner to output rows that have Null as the first column value.

    • Operating system security updates.

  • July 27, 2023

    • Fixed an issue where dbutils.fs.ls() returned INVALID_PARAMETER_VALUE.LOCATION_OVERLAP when called for a storage location path which clashed with other external or managed storage location.

    • [SPARK-44199] CacheManager no longer refreshes the fileIndex unnecessarily.

    • Operating system security updates.

  • July 24, 2023

    • [SPARK-44136] Fixed an issue that StateManager can get materialized in executor instead of driver in FlatMapGroupsWithStateExec.

    • Operating system security updates.

  • June 23, 2023

    • Operating system security updates.

  • June 15, 2023

    • Photonized approx_count_distinct.

    • Snowflake-jdbc library is upgraded to 3.13.29 to address a security issue.

    • [SPARK-43779] ParseToDate now loads EvalMode in the main thread.

    • [SPARK-40862] Support non-aggregated subqueries in RewriteCorrelatedScalarSubquery

    • [SPARK-43156][SPARK-43098] Extended scalar subquery count bug test with decorrelateInnerQuery turned off.

    • [SPARK-43098] Fix correctness COUNT bug when scalar subquery has a group by clause

    • Operating system security updates.

  • June 2, 2023

    • The JSON parser in failOnUnknownFields mode drops a record in DROPMALFORMED mode and fails directly in FAILFAST mode.

    • Improve the performance of incremental updates with SHALLOW CLONE Iceberg and Parquet.

    • Fixed an issue in Auto Loader where different source file formats were inconsistent when the provided schema did not include inferred partitions. This issue could cause unexpected failures when reading files with missing columns in the inferred partition schema.

    • [SPARK-43404]Skip reusing the sst file for the same version of RocksDB state store to avoid the ID mismatch error.

    • [SPARK-43527] Fixed catalog.listCatalogs in PySpark.

    • [SPARK-43413][11.3-13.0] Fixed IN subquery ListQuery nullability.

    • [SPARK-43340] Fixed missing stack trace field in eventlogs.

Databricks Runtime 10.4 LTS

See Databricks Runtime 10.4 LTS.

  • November 29, 2023

    • [SPARK-45544] Integrated SSL support into TransportContext.

    • [SPARK-45859] Made UDF objects in ml.functions lazy.

    • [SPARK-43718] Fixed nullability for keys in USING joins.

    • [SPARK-45730] Improved time constraints for ReloadingX509TrustManagerSuite.

    • [SPARK-42205] Removed logging accumulables in Stage and Task start events.

    • [SPARK-44846] Removed complex grouping expressions after RemoveRedundantAggregates.

    • Operating system security updates.

  • November 14, 2023

  • October 24, 2023

    • [SPARK-45426] Added support for ReloadingX509TrustManager.

    • Operating system security updates.

  • October 13, 2023

    • [SPARK-45084] StateOperatorProgress to use an accurate, adequate shuffle partition number.

    • [SPARK-45178] Fallback to running a single batch for Trigger.AvailableNow with unsupported sources rather than using the wrapper.

    • Operating system security updates.

  • September 10, 2023

    • Miscellaneous fixes.

  • August 30, 2023

    • [SPARK-44818] Fixed race for pending task interrupt issued before taskThread is initialized.

    • Operating system security updates.

  • August 15, 2023

    • [SPARK-44504] Maintenance task cleans up loaded providers on stop error.

    • [SPARK-43973] Structured Streaming UI now appears failed queries correctly.

    • Operating system security updates.

  • June 23, 2023

    • Operating system security updates.

  • June 15, 2023

    • Snowflake-jdbc library is upgraded to 3.13.29 to address a security issue.

    • [SPARK-43098] Fix correctness COUNT bug when scalar subquery has a group by clause

    • [SPARK-40862] Support non-aggregated subqueries in RewriteCorrelatedScalarSubquery

    • [SPARK-43156][SPARK-43098] Extended scalar subquery count test with decorrelateInnerQuery turned off.

    • Operating system security updates.

  • June 2, 2023

    • The JSON parser in failOnUnknownFields mode drops a record in DROPMALFORMED mode and fails directly in FAILFAST mode.

    • Fixed an issue in JSON rescued data parsing to prevent UnknownFieldException.

    • Fixed an issue in Auto Loader where different source file formats were inconsistent when the provided schema did not include inferred partitions. This issue could cause unexpected failures when reading files with missing columns in the inferred partition schema.

    • [SPARK-43404] Skip reusing the sst file for the same version of RocksDB state store to avoid the ID mismatch error.

    • [SPARK-43413] Fixed IN subquery ListQuery nullability.

    • Operating system security updates.

  • May 17, 2023

    • Parquet scans are now robust against OOMs when scanning exceptionally structured files by dynamically adjusting batch size. File metadata is analyzed to preemptively lower batch size and is lowered again on task retries as a final safety net.

    • [SPARK-41520] Split AND_OR tree pattern to separate AND and OR.

    • [SPARK-43190] ListQuery.childOutput is now consistent with secondary output.

    • Operating system security updates.

  • April 25, 2023

    • [SPARK-42928] Make resolvePersistentFunction synchronized.

    • Operating system security updates.

  • April 11, 2023

    • Fixed an issue where Auto Loader schema evolution can go into an infinite fail loop when a new column is detected in the schema of a nested JSON object.

    • [SPARK-42937] PlanSubqueries now sets InSubqueryExec#shouldBroadcast to true.

    • [SPARK-42967] Fix SparkListenerTaskStart.stageAttemptId when a task is started after the stage is canceled.

  • March 29, 2023

    • [SPARK-42668] Catch exception while trying to close the compressed stream in HDFSStateStoreProvider stop

    • [SPARK-42635] Fix the …

    • Operating system security updates.

  • March 14, 2023

    • [SPARK-41162] Fix anti- and semi-join for self-join with aggregations

    • [SPARK-33206] Fix shuffle index cache weight calculation for small index files

    • [SPARK-42484] Improved the UnsafeRowUtils error message

    • Miscellaneous fixes.

  • February 28, 2023

    • Support generated column for yyyy-MM-dd date_format. This change supports partition pruning for yyyy-MM-dd as a date_format in generated columns.

    • Users can now read and write specific Delta tables requiring Reader version 3 and Writer version 7, using Databricks Runtime 9.1 LTS or later. To succeed, table features listed in the tables’ protocol must be supported by the current version of Databricks Runtime.

    • Support generated column for yyyy-MM-dd date_format. This change supports partition pruning for yyyy-MM-dd as a date_format in generated columns.

    • Operating system security updates.

  • February 16, 2023

    • [SPARK-30220] Enable using Exists/In subqueries outside of the Filter node

    • Operating system security updates.

  • January 31, 2023

    • Table types of JDBC tables are now EXTERNAL by default.

  • January 18, 2023

    • Azure Synapse connector returns a more descriptive error message when a column name contains not valid characters such as whitespaces or semicolons. In such cases, the following message will be returned: Azure Synapse Analytics failed to run the JDBC query produced by the connector. Check column names do not include not valid characters such as ';' or white space.

    • [SPARK-38277] Clear write batch after RocksDB state store’s commit

    • [SPARK-41199] Fix metrics issue when DSv1 streaming source and DSv2 streaming source are co-used

    • [SPARK-41198] Fix metrics in streaming query having CTE and DSv1 streaming source

    • [SPARK-41339] Close and recreate RocksDB write batch instead of just clearing

    • [SPARK-41732] Apply tree-pattern based pruning for the rule SessionWindowing

    • Operating system security updates.

  • November 29, 2022

    • Users can configure leading and trailing whitespaces’ behavior when writing data using the Redshift connector. The following options have been added to control white space handling:

      • csvignoreleadingwhitespace, when set to true, removes leading white space from values during writes when tempformat is set to CSV or CSV GZIP. Whitespaces are retained when the config is set to false. By default, the value is true.

      • csvignoretrailingwhitespace, when set to true, removes trailing white space from values during writes when tempformat is set to CSV or CSV GZIP. Whitespaces are retained when the config is set to false. By default, the value is true.

    • Fixed an issue with JSON parsing in Auto Loader when all columns were left as strings (cloudFiles.inferColumnTypes was not set or set to false) and the JSON contained nested objects.

    • Operating system security updates.

  • November 15, 2022

    • Upgraded Apache commons-text to 1.10.0.

    • [SPARK-40646] JSON parsing for structs, maps, and arrays has been fixed so when a part of a record does not match the schema, the rest of the record can still be parsed correctly instead of returning nulls. To opt-in for the improved behavior, set spark.sql.json.enablePartialResults to true. The flag is turned off by default to preserve the original behavior.

    • [SPARK-40292] Fix column names in arrays_zip function when arrays are referenced from nested structs

    • Operating system security updates.

  • November 1, 2022

    • Fixed an issue where if a Delta table had a user-defined column named _change_type, but Change data feed was turned off on that table, data in that column would incorrectly fill with NULL values when running MERGE.

    • Fixed an issue with Auto Loader where a file can be duplicated in the same micro-batch when allowOverwrites is enabled

    • [SPARK-40697] Add read-side char padding to cover external data files

    • [SPARK-40596] Populate ExecutorDecommission with messages in ExecutorDecommissionInfo

    • Operating system security updates.

  • October 18, 2022

    • Operating system security updates.

  • October 5, 2022

    • [SPARK-40468] Fix column pruning in CSV when _corrupt_record is selected.

    • Operating system security updates.

  • September 22, 2022

    • Users can set spark.conf.set(spark.databricks.io.listKeysWithPrefix.azure.enabled, true) to re-enable the built-in listing for Auto Loader on ADLS Gen2. Built-in listing was previously turned off due to performance issues but can have led to increased storage costs for customers.

    • [SPARK-40315] Add hashCode() for Literal of ArrayBasedMapData

    • [SPARK-40213] Support ASCII value conversion for Latin-1 characters

    • [SPARK-40380] Fix constant-folding of InvokeLike to avoid non-serializable literal embedded in the plan

    • [SPARK-38404] Improve CTE resolution when a nested CTE references an outer CTE

    • [SPARK-40089] Fix sorting for some Decimal types

    • [SPARK-39887] RemoveRedundantAliases should keep aliases that make the output of projection nodes unique

  • September 6, 2022

    • [SPARK-40235] Use interruptible lock instead of synchronized in Executor.updateDependencies()

    • [SPARK-40218] GROUPING SETS should preserve the grouping columns

    • [SPARK-39976] ArrayIntersect should handle null in left expression correctly

    • [SPARK-40053] Add assume to dynamic cancel cases which require Python runtime environment

    • [SPARK-35542] Fix: Bucketizer created for multiple columns with parameters splitsArray, inputCols and outputCols can not be loaded after saving it

    • [SPARK-40079] Add Imputer inputCols validation for empty input case

  • August 24, 2022

    • [SPARK-39983] Do not cache unserialized broadcast relations on the driver

    • [SPARK-39775] Disable validate default values when parsing Avro schemas

    • [SPARK-39962] Apply projection when group attributes are empty

    • [SPARK-37643] when charVarcharAsString is true, for char datatype predicate query should skip rpadding rule

    • Operating system security updates.

  • August 9, 2022

    • [SPARK-39847] Fix race condition in RocksDBLoader.loadLibrary() if the caller thread is interrupted

    • [SPARK-39731] Fix issue in CSV and JSON data sources when parsing dates in “yyyyMMdd” format with CORRECTED time parser policy

    • Operating system security updates.

  • July 27, 2022

    • [SPARK-39625] Add Dataset.as(StructType)

    • [SPARK-39689]Support 2-chars lineSep in CSV data source

    • [SPARK-39104] InMemoryRelation#isCachedColumnBuffersLoaded should be thread-safe

    • [SPARK-39570] Inline table should allow expressions with alias

    • [SPARK-39702] Reduce memory overhead of TransportCipher$EncryptedMessage by using a shared byteRawChannel

    • [SPARK-39575] add ByteBuffer#rewind after ByteBuffer#get in AvroDeserializer

    • [SPARK-39476] Disable Unwrap cast optimize when casting from Long to Float/ Double or from Integer to Float

    • [SPARK-38868] Don’t propagate exceptions from filter predicate when optimizing outer joins

    • Operating system security updates.

  • July 20, 2022

    • Make Delta MERGE operation results consistent when the source is non-deterministic.

    • [SPARK-39355] Single column uses quoted to construct UnresolvedAttribute

    • [SPARK-39548] CreateView Command with a window clause query press a wrong window definition not found issue

    • [SPARK-39419] Fix ArraySort to throw an exception when the comparator returns null

    • Turned off Auto Loader’s use of built-in cloud APIs for directory listing on Azure.

    • Operating system security updates.

  • July 5, 2022

    • [SPARK-39376] Hide duplicated columns in star expansion of subquery alias from NATURAL/USING JOIN

    • Operating system security updates.

  • June 15, 2022

    • [SPARK-39283] Fix deadlock between TaskMemoryManager and UnsafeExternalSorter.SpillableIterator

    • [SPARK-39285] Spark should not check field names when reading files

    • [SPARK-34096] Improve performance for nth_value ignore nulls over offset window

    • [SPARK-36718] Fix the isExtractOnly check in CollapseProject

  • June 2, 2022

    • [SPARK-39093] Avoid codegen compilation error when dividing year-month intervals or day-time intervals by an integral

    • [SPARK-38990] Avoid NullPointerException when evaluating date_trunc/trunc format as a bound reference

    • Operating system security updates.

  • May 18, 2022

    • Fixes a potential built-in memory leak in Auto Loader.

    • [SPARK-38918] Nested column pruning should filter out attributes that do not belong to the current relation

    • [SPARK-37593] Reduce default page size by LONG_ARRAY_OFFSET if G1GC and ON_HEAP are used

    • [SPARK-39084] Fix df.rdd.isEmpty() by using TaskContext to stop iterator on task completion

    • [SPARK-32268] Add ColumnPruning in injectBloomFilter

    • [SPARK-38974] Filter registered functions with a given database name in list functions

    • [SPARK-38931] Create root dfs directory for RocksDBFileManager with an unknown number of keys on 1st checkpoint

    • Operating system security updates.

  • April 19, 2022

    • Upgraded Java AWS SDK from version 1.11.655 to 1.12.1899.

    • Fixed an issue with notebook-scoped libraries not working in batch streaming jobs.

    • [SPARK-38616] Keep track of SQL query text in Catalyst TreeNode

    • Operating system security updates.

  • April 6, 2022

    • The following Spark SQL functions are now available with this release:

      • timestampadd() and dateadd(): Add a time duration in a specified unit to a time stamp expression.

      • timestampdiff() and datediff(): Calculate the time difference between two-time stamp expressions in a specified unit.

    • Parquet-MR has been upgraded to 1.12.2

    • Improved support for comprehensive schemas in parquet files

    • [SPARK-38631] Uses Java-based implementation for un-tarring at Utils.unpack

    • [SPARK-38509][SPARK-38481] Cherry-pick three timestmapadd/diff changes.

    • [SPARK-38523] Fix referring to the corrupt record column from CSV

    • [SPARK-38237] Allow ClusteredDistribution to require full clustering keys

    • [SPARK-38437] Lenient serialization of datetime from datasource

    • [SPARK-38180] Allow safe up-cast expressions in correlated equality predicates

    • [SPARK-38155] Disallow distinct aggregate in lateral subqueries with unsupported predicates

    • Operating system security updates.

Databricks Runtime 9.1 LTS

See Databricks Runtime 9.1 LTS.

  • November 29, 2023

    • [SPARK-45859] Made UDF objects in ml.functions lazy.

    • [SPARK-45544] Integrated SSL support into TransportContext.

    • [SPARK-45730] Improved time constraints for ReloadingX509TrustManagerSuite.

    • Operating system security updates.

  • November 14, 2023

    • [SPARK-45545] SparkTransportConf inherits SSLOptions upon creation.

    • [SPARK-45429] Added helper classes for SSL RPC communication.

    • [SPARK-45427] Added RPC SSL settings to SSLOptions and SparkTransportConf.

    • [SPARK-45584] Fixed subquery run failure with TakeOrderedAndProjectExec.

    • [SPARK-45541] Added SSLFactory.

    • [SPARK-42205] Removed logging accumulables in Stage and Task start events.

    • Operating system security updates.

  • October 24, 2023

    • [SPARK-45426] Added support for ReloadingX509TrustManager.

    • Operating system security updates.

  • October 13, 2023

    • Operating system security updates.

  • September 10, 2023

    • Miscellaneous fixes.

  • August 30, 2023

    • Operating system security updates.

  • August 15, 2023

    • Operating system security updates.

  • June 23, 2023

    • Snowflake-jdbc library is upgraded to 3.13.29 to address a security issue.

    • Operating system security updates.

  • June 15, 2023

    • [SPARK-43098] Fix correctness COUNT bug when scalar subquery has a group by clause

    • [SPARK-43156][SPARK-43098] Extend scalar subquery count bug test with decorrelateInnerQuery turned off.

    • [SPARK-40862] Support non-aggregated subqueries in RewriteCorrelatedScalarSubquery

    • Operating system security updates.

  • June 2, 2023

    • The JSON parser in failOnUnknownFields mode drops a record in DROPMALFORMED mode and fails directly in FAILFAST mode.

    • Fixed an issue in JSON rescued data parsing to prevent UnknownFieldException.

    • Fixed an issue in Auto Loader where different source file formats were inconsistent when the provided schema did not include inferred partitions. This issue could cause unexpected failures when reading files with missing columns in the inferred partition schema.

    • [SPARK-37520] Add the startswith() and endswith() string functions

    • [SPARK-43413] Fixed IN subquery ListQuery nullability.

    • Operating system security updates.

  • May 17, 2023

    • Operating system security updates.

  • April 25, 2023

    • Operating system security updates.

  • April 11, 2023

    • Fixed an issue where Auto Loader schema evolution can go into an infinite fail loop when a new column is detected in the schema of a nested JSON object.

    • [SPARK-42967] Fix SparkListenerTaskStart.stageAttemptId when a task is started after the stage is canceled.

  • March 29, 2023

    • Operating system security updates.

  • March 14, 2023

    • [SPARK-42484] Improved error message for UnsafeRowUtils.

    • Miscellaneous fixes.

  • February 28, 2023

    • Users can now read and write specific Delta tables requiring Reader version 3 and Writer version 7, using Databricks Runtime 9.1 LTS or later. To succeed, table features listed in the tables’ protocol must be supported by the current version of Databricks Runtime.

    • Operating system security updates.

  • February 16, 2023

    • Operating system security updates.

  • January 31, 2023

    • Table types of JDBC tables are now EXTERNAL by default.

  • January 18, 2023

    • Operating system security updates.

  • November 29, 2022

    • Fixed an issue with JSON parsing in Auto Loader when all columns were left as strings (cloudFiles.inferColumnTypes was not set or set to false) and the JSON contained nested objects.

    • Operating system security updates.

  • November 15, 2022

    • Upgraded Apache commons-text to 1.10.0.

    • Operating system security updates.

    • Miscellaneous fixes.

  • November 1, 2022

    • Fixed an issue where if a Delta table had a user-defined column named _change_type, but Change data feed was turned off on that table, data in that column would incorrectly fill with NULL values when running MERGE.

    • Fixed an issue with Auto Loader where a file can be duplicated in the same micro-batch when allowOverwrites is enabled

    • [SPARK-40596] Populate ExecutorDecommission with messages in ExecutorDecommissionInfo

    • Operating system security updates.

  • October 18, 2022

    • Operating system security updates.

  • October 5, 2022

    • Miscellaneous fixes.

    • Operating system security updates.

  • September 22, 2022

    • Users can set spark.conf.set(“spark.databricks.io.listKeysWithPrefix.azure.enabled”, “true”) to re-enable the built-in listing for Auto Loader on ADLS Gen2. Built-in listing was previously turned off due to performance issues but can have led to increased storage costs for customers.

    • [SPARK-40315] Add hashCode() for Literal of ArrayBasedMapData

    • [SPARK-40089] Fix sorting for some Decimal types

    • [SPARK-39887] RemoveRedundantAliases should keep aliases that make the output of projection nodes unique

  • September 6, 2022

    • [SPARK-40235] Use interruptible lock instead of synchronized in Executor.updateDependencies()

    • [SPARK-35542] Fix: Bucketizer created for multiple columns with parameters splitsArray, inputCols and outputCols can not be loaded after saving it

    • [SPARK-40079] Add Imputer inputCols validation for empty input case

  • August 24, 2022

    • [SPARK-39666] Use UnsafeProjection.create to respect spark.sql.codegen.factoryMode in ExpressionEncoder

    • [SPARK-39962] Apply projection when group attributes are empty

    • Operating system security updates.

  • August 9, 2022

    • Operating system security updates.

  • July 27, 2022

    • Make Delta MERGE operation results consistent when the source is non-deterministic.

    • [SPARK-39689] Support for 2-chars lineSep in CSV data source

    • [SPARK-39575] Added ByteBuffer#rewind after ByteBuffer#get in AvroDeserializer.

    • [SPARK-37392] Fixed the performance error for catalyst optimizer.

    • Operating system security updates.

  • July 13, 2022

    • [SPARK-39419] ArraySort throws an exception when the comparator returns null.

    • Turned off Auto Loader’s use of built-in cloud APIs for directory listing on Azure.

    • Operating system security updates.

  • July 5, 2022

    • Operating system security updates.

    • Miscellaneous fixes.

  • June 15, 2022

    • [SPARK-39283] Fix deadlock between TaskMemoryManager and UnsafeExternalSorter.SpillableIterator.

  • June 2, 2022

    • [SPARK-34554] Implement the copy() method in ColumnarMap.

    • Operating system security updates.

  • May 18, 2022

    • Fixed a potential built-in memory leak in Auto Loader.

    • Upgrade AWS SDK version from 1.11.655 to 1.11.678.

    • [SPARK-38918] Nested column pruning should filter out attributes that do not belong to the current relation

    • [SPARK-39084] Fix df.rdd.isEmpty() by using TaskContext to stop iterator on task completion

    • Operating system security updates.

  • April 19, 2022

    • Operating system security updates.

    • Miscellaneous fixes.

  • April 6, 2022

    • [SPARK-38631] Uses Java-based implementation for un-tarring at Utils.unpack

    • Operating system security updates.

  • March 22, 2022

    • Changed the current working directory of notebooks on High Concurrency clusters with either table access control or credential passthrough enabled to the user’s home directory. Previously, the active directory was /databricks/driver.

    • [SPARK-38437] Lenient serialization of datetime from datasource

    • [SPARK-38180] Allow safe up-cast expressions in correlated equality predicates

    • [SPARK-38155] Disallow distinct aggregate in lateral subqueries with unsupported predicates

    • [SPARK-27442] Removed a check field when reading or writing data in a parquet.

  • March 14, 2022

    • [SPARK-38236] Absolute file paths specified in the create/alter table are treated as relative

    • [SPARK-34069] Interrupt task thread if local property SPARK_JOB_INTERRUPT_ON_CANCEL is set to true.

  • February 23, 2022

    • [SPARK-37859] SQL tables created with JDBC with Spark 3.1 are not readable with Spark 3.2.

  • February 8, 2022

    • [SPARK-27442] Removed a check field when reading or writing data in a parquet.

    • Operating system security updates.

  • February 1, 2022

    • Operating system security updates.

  • January 26, 2022

    • Fixed an issue where concurrent transactions on Delta tables could commit in a non-serializable order under certain rare conditions.

    • Fixed an issue where the OPTIMIZE command could fail when the ANSI SQL dialect was enabled.

  • January 19, 2022

    • Minor fixes and security enhancements.

    • Operating system security updates.

  • November 4, 2021

    • Fixed an issue that could cause Structured Streaming streams to fail with an ArrayIndexOutOfBoundsException.

    • Fixed a race condition that might cause a query failure with an IOException like java.io.IOException: No FileSystem for scheme or that might cause modifications to sparkContext.hadoopConfiguration to not take effect in queries.

    • The Apache Spark Connector for Delta Sharing was upgraded to 0.2.0.

  • October 20, 2021

    • Upgraded BigQuery connector from 0.18.1 to 0.22.2. This adds support for the BigNumeric type.

Databricks Runtime 13.0 (unsupported)

See Databricks Runtime 13.0 (unsupported).

  • October 13, 2023

    • Snowflake-jdbc dependency upgraded from 3.13.29 to 3.13.33.

    • [SPARK-42553][SQL] Ensure at least one time unit after interval.

    • [SPARK-45178] Fallback to running a single batch for Trigger.AvailableNow with unsupported sources rather than using wrapper.

    • [SPARK-44658][CORE] ShuffleStatus.getMapStatus returns None instead of Some(null).

    • [SPARK-42205][CORE] Remove logging of Accumulables in Task/Stage start events in JsonProtocol.

    • Operating system security updates.

  • September 12, 2023

    • [SPARK-44485][SQL] Optimize TreeNode.generateTreeString.

    • [SPARK-44718][SQL] Match ColumnVector memory-mode config default to OffHeapMemoryMode config value.

    • Miscellaneous bug fixes.

  • August 30, 2023

  • August 15, 2023

    • [SPARK-44643][SQL][PYTHON] Fix Row.__repr__ when the row is empty.

    • [SPARK-44504][Backport] Maintenance task cleans up loaded providers on stop error.

    • [SPARK-44479][CONNECT][PYTHON] Fixed protobuf conversion from an empty struct type.

    • [SPARK-44464][SS] Fixed applyInPandasWithStatePythonRunner to output rows that have Null as first column value.

    • Miscellaneous bug fixes.

  • July 29, 2023

    • Fixed a bug where dbutils.fs.ls() returned INVALID_PARAMETER_VALUE.LOCATION_OVERLAP when called for a storage location path which clashed with other external or managed storage location.

    • [SPARK-44199] CacheManager no longer refreshes the fileIndex unnecessarily.

    • Operating system security updates.

  • July 24, 2023

    • [SPARK-44337][PROTOBUF] Fixed an issue where any field set to Any.getDefaultInstance caused parse errors.

    • [SPARK-44136] [SS] Fixed an issue where StateManager would get materialized in an executor instead of driver in FlatMapGroupsWithStateExec.

    • Revert [SPARK-42323][SQL] Assign name to _LEGACY_ERROR_TEMP_2332.

    • Operating system security updates.

  • June 23, 2023

    • Operating system security updates.

  • June 15, 2023

    • Photonized approx_count_distinct.

    • Snowflake-jdbc library is upgraded to 3.13.29 to address a security issue.

    • [SPARK-43156][SPARK-43098][SQL] Extend scalar subquery count bug test with decorrelateInnerQuery disabled

    • [SPARK-43779][SQL] ParseToDate now loads EvalMode in the main thread.

    • [SPARK-42937][SQL] PlanSubqueries should set InSubqueryExec#shouldBroadcast to true

    • Operating system security updates.

  • June 2, 2023

    • The JSON parser in failOnUnknownFields mode drops a record in DROPMALFORMED mode and fails directly in FAILFAST mode.

    • Improve the performance of incremental update with SHALLOW CLONE Iceberg and Parquet.

    • Fixed an issue in Auto Loader where different source file formats were inconsistent when the provided schema did not include inferred partitions. This issue could cause unexpected failures when reading files with missing columns in the inferred partition schema.

    • [SPARK-43404][Backport] Skip reusing sst file for same version of RocksDB state store to avoid ID mismatch error.

    • [SPARK-43340][CORE] Fixed missing stack trace field in eventlogs.

    • [SPARK-43300][CORE] NonFateSharingCache wrapper for Guava Cache.

    • [SPARK-43378][CORE] Properly close stream objects in deserializeFromChunkedBuffer.

    • [SPARK-16484][SQL] Use 8-bit registers for representing DataSketches.

    • [SPARK-43522][SQL] Fixed creating struct column name with index of array.

    • [SPARK-43413][11.3-13.0][SQL] Fixed IN subquery ListQuery nullability.

    • [SPARK-43043][CORE] Improved MapOutputTracker.updateMapOutput performance.

    • [SPARK-16484][SQL] Added support for DataSketches HllSketch.

    • [SPARK-43123][SQL] Internal field metadata no longer leaks to catalogs.

    • [SPARK-42851][SQL] Guard EquivalentExpressions.addExpr() with supportedExpression().

    • [SPARK-43336][SQL] Casting between Timestamp and TimestampNTZ requires timezone.

    • [SPARK-43286][SQL] Updated aes_encrypt CBC mode to generate random IVs.

    • [SPARK-42852][SQL] Reverted NamedLambdaVariable related changes from EquivalentExpressions.

    • [SPARK-43541][SQL] Propagate all Project tags in resolving of expressions and missing columns..

    • [SPARK-43527][PYTHON] Fixed catalog.listCatalogs in PySpark.

    • Operating system security updates.

  • May 31, 2023

    • Default optimized write support for Delta tables registered in Unity Catalog has expanded to include CTAS statements and INSERT operations for partitioned tables. This behavior aligns to defaults on SQL warehouses. See Optimized writes for Delta Lake on Databricks.

  • May 17, 2023

    • Fixed a regression where _metadata.file_path and _metadata.file_name would return incorrectly formatted strings. For example, now a path with spaces are be represented as s3://test-bucket/some%20directory/some%20data.csv instead of s3://test-bucket/some directory/some data.csv.

    • Parquet scans are now robust against OOMs when scanning exceptionally structured files by dynamically adjusting batch size. File metadata is analyzed to preemptively lower batch size and is lowered again on task retries as a final safety net.

      • If an Avro file was read with just the failOnUnknownFields\ option or with Auto Loader in the failOnNewColumns\ schema evolution mode, columns that have different data types would be read as null\ instead of throwing an error stating that the file cannot be read. These reads now fail and recommend users to use the rescuedDataColumn\ option.

    • Auto Loader now does the following.

      • Correctly reads and no longer rescues Integer, Short, Byte types if one of these data types are provided, but the Avro file suggests one of the other two types.

      • Prevents reading interval types as date or timestamp types to avoid getting corrupt dates.

      • Prevents reading Decimal types with lower precision.

    • [SPARK-43172] [CONNECT] Exposes host and token from Spark connect client.

    • [SPARK-43293][SQL] __qualified_access_only is ignored in normal columns.

    • [SPARK-43098][SQL] Fixed correctness COUNT bug when scalar subquery is grouped by clause.

    • [SPARK-43085][SQL] Support for column DEFAULT assignment for multi-part table names.

    • [SPARK-43190][SQL] ListQuery.childOutput is now consistent with secondary output.

    • [SPARK-43192] [CONNECT] Removed user agent charset validation.

  • April 25, 2023

    • You can modify a Delta table to add support for a Delta table feature using DeltaTable.addFeatureSupport(feature_name).

    • The SYNC command now supports legacy data source formats.

    • Fixed a bug where using the Python formatter before running any other commands in a Python notebook could cause the notebook path to be missing from sys.path.

    • Databricks now supports specifying default values for columns of Delta tables. INSERT, UPDATE, DELETE, and MERGE commands can refer to a column’s default value using the explicit DEFAULT keyword. For INSERT commands with an explicit list of fewer columns than the target table, corresponding column default values are substituted for the remaining columns (or NULL if no default is specified).

  • Fixes a bug where the web terminal could not be used to access files in /Workspace for some users.

    • If a Parquet file was read with just the failOnUnknownFields option or with Auto Loader in the failOnNewColumns schema evolution mode, columns that had different data types would be read as null instead of throwing an error stating that the file cannot be read. These reads now fail and recommend users to use the rescuedDataColumn option.

    • Auto Loader now correctly reads and no longer rescues Integer, Short, Byte types if one of these data types are provided. The Parquet file suggests one of the other two types. When the rescued data column was previously enabled, the data type mismatch would cause columns to be rescued even though they were readable.

    • Fixed a bug where Auto Loader schema evolution can go into an infinite fail loop, when a new column is detected in the schema of a nested JSON object.

    • [SPARK-42794][SS] Increase the lockAcquireTimeoutMs to 2 minutes for acquiring the RocksDB state store in Structure Streaming.

    • [SPARK-39221][SQL] Make sensitive information be redacted correctly for thrift server job/stage tab.

    • [SPARK-42971][CORE] Change to print workdir if appDirs is null when worker handle WorkDirCleanup event.

    • [SPARK-42936][SQL] Fix LCA bug when the having clause can be resolved directly by its child aggregate.

    • [SPARK-43018][SQL] Fix bug for INSERT commands with timestamp literals.

    • Revert [SPARK-42754][SQL][UI] Fix backward compatibility issue in nested SQL run.

    • Revert [SPARK-41498] Propagate metadata through Union.

    • [SPARK-43038][SQL] Support the CBC mode by aes_encrypt()/aes_decrypt().

    • [SPARK-42928][SQL] Make resolvePersistentFunction synchronized.

    • [SPARK-42521][SQL] Add NULL values for INSERT with user-specified lists of fewer columns than the target table.

    • [SPARK-41391][SQL] The output column name of groupBy.agg(count_distinct) was incorrect.

    • [SPARK-42548][SQL] Add ReferenceAllColumns to skip rewriting attributes.

    • [SPARK-42423][SQL] Add metadata column file block start and length.

    • [SPARK-42796][SQL] Support accessing TimestampNTZ columns in CachedBatch.

    • [SPARK-42266][PYTHON] Remove the parent directory in shell.py run when IPython is used.

    • [SPARK-43011][SQL] array_insert should fail with 0 index.

    • [SPARK-41874][CONNECT][PYTHON] Support SameSemantics in Spark Connect.

    • [SPARK-42702][SPARK-42623][SQL] Support parameterized query in subquery and CTE.

    • [SPARK-42967][CORE] Fix SparkListenerTaskStart.stageAttemptId when a task is started after the stage is cancelled.

    • Operating system security updates.

Databricks Runtime 12.1 (unsupported)

See Databricks Runtime 12.1 (unsupported).

  • June 23, 2023

    • Operating system security updates.

  • June 15, 2023

    • Photonized approx_count_distinct.

    • Snowflake-jdbc library is upgraded to 3.13.29 to address a security issue.

    • [SPARK-43779][SQL] ParseToDate now loads EvalMode in the main thread.

    • [SPARK-43156][SPARK-43098][SQL] Extend scalar subquery count bug test with decorrelateInnerQuery disabled

    • Operating system security updates.

  • June 2, 2023

    • The JSON parser in failOnUnknownFields mode drops a record in DROPMALFORMED mode and fails directly in FAILFAST mode.

    • Improve the performance of incremental update with SHALLOW CLONE Iceberg and Parquet.

    • Fixed an issue in Auto Loader where different source file formats were inconsistent when the provided schema did not include inferred partitions. This issue could cause unexpected failures when reading files with missing columns in the inferred partition schema.

    • [SPARK-43404][Backport] Skip reusing sst file for same version of RocksDB state store to avoid ID mismatch error.

    • [SPARK-43413][11.3-13.0][SQL] Fixed IN subquery ListQuery nullability.

    • [SPARK-43522][SQL] Fixed creating struct column name with index of array.

    • [SPARK-42444][PYTHON] DataFrame.drop now handles duplicated columns properly.

    • [SPARK-43541][SQL] Propagate all Project tags in resolving of expressions and missing columns..

    • [SPARK-43340][CORE] Fixed missing stack trace field in eventlogs.

    • [SPARK-42937][SQL] PlanSubqueries now sets InSubqueryExec#shouldBroadcast to true.

    • [SPARK-43527][PYTHON] Fixed catalog.listCatalogs in PySpark.

    • [SPARK-43378][CORE] Properly close stream objects in deserializeFromChunkedBuffer.

  • May 17, 2023

    • Parquet scans are now robust against OOMs when scanning exceptionally structured files by dynamically adjusting batch size. File metadata is analyzed to preemptively lower batch size and is lowered again on task retries as a final safety net.

    • If an Avro file was read with just the failOnUnknownFields\ option or with Auto Loader in the failOnNewColumns\ schema evolution mode, columns that have different data types would be read as null\ instead of throwing an error stating that the file cannot be read. These reads now fail and recommend users to use the rescuedDataColumn\ option.

    • Auto Loader now does the following.

      • Correctly reads and no longer rescues Integer, Short, Byte types if one of these data types are provided, but the Avro file suggests one of the other two types.

      • Prevents reading interval types as date or timestamp types to avoid getting corrupt dates.

      • Prevents reading Decimal types with lower precision.

    • [SPARK-43098][SQL] Fixed correctness COUNT bug when scalar subquery is grouped by clause.

    • [SPARK-43190][SQL] ListQuery.childOutput is now consistent with secondary output.

    • Operating system security updates.

  • April 25, 2023

    • If a Parquet file was read with just the failOnUnknownFields option or with Auto Loader in the failOnNewColumns schema evolution mode, columns that had different data types would be read as null instead of throwing an error stating that the file cannot be read. These reads now fail and recommend users to use the rescuedDataColumn option.

    • Auto Loader now correctly reads and no longer rescues Integer, Short, Byte types if one of these data types are provided. The Parquet file suggests one of the other two types. When the rescued data column was previously enabled, the data type mismatch would cause columns to be rescued even though they were readable.

    • [SPARK-43009][SQL] Parameterized sql() with Any constants.

    • [SPARK-42971][CORE] Change to print workdir if appDirs is null when worker handle WorkDirCleanup event.

    • Operating system security updates.

  • April 11, 2023

    • Support legacy data source formats in SYNC command.

    • Fixes a bug in the %autoreload behavior in notebooks that are outside of a repo.

    • Fixed a bug where Auto Loader schema evolution can go into an infinite fail loop, when a new column is detected in the schema of a nested JSON object.

    • [SPARK-42928][SQL] Makes resolvePersistentFunction synchronized.

    • [SPARK-42967][CORE] Fixes SparkListenerTaskStart.stageAttemptId when a task starts after the stage is cancelled.

    • Operating system security updates.

  • March 29, 2023

    • Auto Loader now triggers at least one synchronous RocksDB log clean up for Trigger.AvailableNow streams to ensure that the checkpoint can get regularly cleaned up for fast-running Auto Loader streams. This can cause some streams to take longer before they shut down, but will save you storage costs and improve the Auto Loader experience in future runs.

    • You can now modify a Delta table to add support to table features using DeltaTable.addFeatureSupport(feature_name).

    • [SPARK-42702][SPARK-42623][SQL] Support parameterized query in subquery and CTE

    • [SPARK-41162][SQL] Fix anti- and semi-join for self-join with aggregations

    • [SPARK-42403][CORE] JsonProtocol should handle null JSON strings

    • [SPARK-42668][SS] Catch exception while trying to close compressed stream in HDFSStateStoreProvider abort

    • [SPARK-42794][SS] Increase the lockAcquireTimeoutMs to 2 minutes for acquiring the RocksDB state store in Structure Streaming

  • March 14, 2023

    • There is a terminology change for adding features to a Delta table using the table property. The preferred syntax is now 'delta.feature.featureName'='supported' instead of 'delta.feature.featureName'='enabled'. For backwards compatibility, using 'delta.feature.featureName'='enabled' still works and will continue to work.

    • [SPARK-42622][CORE] Disable substitution in values

    • [SPARK-42534][SQL] Fix DB2Dialect Limit clause

    • [SPARK-42635][SQL] Fix the TimestampAdd expression.

    • [SPARK-42516][SQL] Always capture the session time zone config while creating views

    • [SPARK-42484] [SQL] UnsafeRowUtils better error message

    • [SPARK-41793][SQL] Incorrect result for window frames defined by a range clause on large decimals

    • Operating system security updates.

  • February 24, 2023

    • You can now use a unified set of options (host, port, database, user, password) for connecting to the data sources supported in Query Federation (PostgreSQL, MySQL, Synapse, Snowflake, Redshift, SQL Server). Note that port is optional and uses the default port number for each data source if not provided.

    Example of PostgreSQL connection configuration

    CREATE TABLE postgresql_table
    USING postgresql
    OPTIONS (
      dbtable '<table-name>',
      host '<host-name>',
      database '<database-name>',
      user '<user>',
      password secret('scope', 'key')
    );
    

    Example of Snowflake connection configuration

    CREATE TABLE snowflake_table
    USING snowflake
    OPTIONS (
      dbtable '<table-name>',
      host '<host-name>',
      port '<port-number>',
      database '<database-name>',
      user secret('snowflake_creds', 'my_username'),
      password secret('snowflake_creds', 'my_password'),
      schema '<schema-name>',
      sfWarehouse '<warehouse-name>'
    );
    
    • [SPARK-41989][PYTHON] Avoid breaking logging config from pyspark.pandas

    • [SPARK-42346][SQL] Rewrite distinct aggregates after subquery merge

    • [SPARK-41990][SQL] Use FieldReference.column instead of apply in V1 to V2 filter conversion

    • Revert [SPARK-41848][CORE] Fixing task over-scheduled with TaskResourceProfile

    • [SPARK-42162] Introduce MultiCommutativeOp expression as a memory optimization for canonicalizing large trees of commutative expressions

    • Operating system security updates.

  • February 16, 2023

    • SYNC command supports syncing recreated Hive Metastore tables. If a HMS table has been SYNCed previously to Unity Catalog but then dropped and recreated, a subsequent re-sync will work instead of throwing TABLE_ALREADY_EXISTS status code.

    • [SPARK-41219][SQL] IntegralDivide use decimal(1, 0) to represent 0

    • [SPARK-36173][CORE] Support getting CPU number in TaskContext

    • [SPARK-41848][CORE] Fixing task over-scheduled with TaskResourceProfile

    • [SPARK-42286][SQL] Fallback to previous codegen code path for complex expr with CAST

  • January 31, 2023

    • Creating a schema with a defined location now requires the user to have SELECT and MODIFY privileges on ANY FILE.

    • [SPARK-41581][SQL] Assign name to LEGACYERROR_TEMP_1230

    • [SPARK-41996][SQL][SS] Fix kafka test to verify lost partitions to account for slow Kafka operations

    • [SPARK-41580][SQL] Assign name to LEGACYERROR_TEMP_2137

    • [SPARK-41666][PYTHON] Support parameterized SQL by sql()

    • [SPARK-41579][SQL] Assign name to LEGACYERROR_TEMP_1249

    • [SPARK-41573][SQL] Assign name to LEGACYERROR_TEMP_2136

    • [SPARK-41574][SQL] Assign name to LEGACYERROR_TEMP_2009

    • [SPARK-41049][Followup] Fix a code sync regression for ConvertToLocalRelation

    • [SPARK-41576][SQL] Assign name to LEGACYERROR_TEMP_2051

    • [SPARK-41572][SQL] Assign name to LEGACYERROR_TEMP_2149

    • [SPARK-41575][SQL] Assign name to LEGACYERROR_TEMP_2054

    • Operating system security updates.

Databricks Runtime 12.0 (unsupported)

See Databricks Runtime 12.0 (unsupported).

  • June 15, 2023

    • Photonized approx_count_distinct.

    • Snowflake-jdbc library is upgraded to 3.13.29 to address a security issue.

    • [SPARK-43156][SPARK-43098][SQL] Extend scalar subquery count bug test with decorrelateInnerQuery disabled

    • [SPARK-43779][SQL] ParseToDate now loads EvalMode in the main thread.

    • Operating system security updates.

  • June 2, 2023

    • The JSON parser in failOnUnknownFields mode drops a record in DROPMALFORMED mode and fails directly in FAILFAST mode.

    • Improve the performance of incremental update with SHALLOW CLONE Iceberg and Parquet.

    • Fixed an issue in Auto Loader where different source file formats were inconsistent when the provided schema did not include inferred partitions. This issue could cause unexpected failures when reading files with missing columns in the inferred partition schema.

    • [SPARK-42444][PYTHON] DataFrame.drop now handles duplicated columns properly.

    • [SPARK-43404][Backport] Skip reusing sst file for same version of RocksDB state store to avoid ID mismatch error.

    • [11.3-13.0][[SPARK-43413]]https://issues.apache.org/jira/browse/SPARK-43413)[SQL] Fixed IN subquery ListQuery nullability.

    • [SPARK-43527][PYTHON] Fixed catalog.listCatalogs in PySpark.

    • [SPARK-43522][SQL] Fixed creating struct column name with index of array.

    • [SPARK-43541][SQL] Propagate all Project tags in resolving of expressions and missing columns..

    • [SPARK-43340][CORE] Fixed missing stack trace field in eventlogs.

    • [SPARK-42937][SQL] PlanSubqueries set InSubqueryExec#shouldBroadcast to true.

  • May 17, 2023

    • Parquet scans are now robust against OOMs when scanning exceptionally structured files by dynamically adjusting batch size. File metadata is analyzed to preemptively lower batch size and is lowered again on task retries as a final safety net.

    • If an Avro file was read with just the failOnUnknownFields\ option or with Auto Loader in the failOnNewColumns\ schema evolution mode, columns that have different data types would be read as null\ instead of throwing an error stating that the file cannot be read. These reads now fail and recommend users to use the rescuedDataColumn\ option.

    • Auto Loader now does the following.

      • Correctly reads and no longer rescues Integer, Short, Byte types if one of these data types are provided, but the Avro file suggests one of the other two types.

      • Prevents reading interval types as date or timestamp types to avoid getting corrupt dates.

      • Prevents reading Decimal types with lower precision.

    • [SPARK-43172] [CONNECT] Exposes host and token from Spark connect client.

    • [SPARK-41520][SQL] Split AND_OR tree pattern to separate AND and OR.

    • [SPARK-43098][SQL] Fixed correctness COUNT bug when scalar subquery is grouped by clause.

    • [SPARK-43190][SQL] ListQuery.childOutput is now consistent with secondary output.

    • Operating system security updates.

  • April 25, 2023

    • If a Parquet file was read with just the failOnUnknownFields option or with Auto Loader in the failOnNewColumns schema evolution mode, columns that had different data types would be read as null instead of throwing an error stating that the file cannot be read. These reads now fail and recommend users to use the rescuedDataColumn option.

    • Auto Loader now correctly reads and no longer rescues Integer, Short, Byte types if one of these data types are provided. The Parquet file suggests one of the other two types. When the rescued data column was previously enabled, the data type mismatch would cause columns to be rescued even though they were readable.

    • [SPARK-42971][CORE] Change to print workdir if appDirs is null when worker handle WorkDirCleanup event

    • Operating system security updates.

  • April 11, 2023

    • Support legacy data source formats in SYNC command.

    • Fixes a bug in the %autoreload behavior in notebooks which are outside of a repo.

    • Fixed a bug where Auto Loader schema evolution can go into an infinite fail loop, when a new column is detected in the schema of a nested JSON object.

    • [SPARK-42928][SQL] Makes resolvePersistentFunction synchronized.

    • [SPARK-42967][CORE] Fixes SparkListenerTaskStart.stageAttemptId when a task starts after the stage is cancelled.

    • Operating system security updates.

  • March 29, 2023

    • [SPARK-42794][SS] Increase the lockAcquireTimeoutMs to 2 minutes for acquiring the RocksDB state store in Structure Streaming

    • [SPARK-41162][SQL] Fix anti- and semi-join for self-join with aggregations

    • [SPARK-42403][CORE] JsonProtocol should handle null JSON strings

    • [SPARK-42668][SS] Catch exception while trying to close compressed stream in HDFSStateStoreProvider abort

    • Miscellaneous bug fixes.

  • March 14, 2023

    • [SPARK-42534][SQL] Fix DB2Dialect Limit clause

    • [SPARK-42622][CORE] Disable substitution in values

    • [SPARK-41793][SQL] Incorrect result for window frames defined by a range clause on large decimals

    • [SPARK-42484] [SQL] UnsafeRowUtils better error message

    • [SPARK-42635][SQL] Fix the TimestampAdd expression.

    • [SPARK-42516][SQL] Always capture the session time zone config while creating views

    • Operating system security updates.

  • February 24, 2023

    • Standardized Connection Options for Query Federation

      You can now use a unified set of options (host, port, database, user, password) for connecting to the data sources supported in Query Federation (PostgreSQL, MySQL, Synapse, Snowflake, Redshift, SQL Server). Note that port is optional and will use the default port number for each data source if not provided.

      Example of PostgreSQL connection configuration

      CREATE TABLE postgresql_table
      USING postgresql
      OPTIONS (
        dbtable '<table-name>',
        host '<host-name>',
        database '<database-name>',
        user '<user>',
        password secret('scope', 'key')
      );
      

      Example of Snowflake connection configuration

      CREATE TABLE snowflake_table
      USING snowflake
      OPTIONS (
        dbtable '<table-name>',
        host '<host-name>',
        port '<port-number>',
        database '<database-name>',
        user secret('snowflake_creds', 'my_username'),
        password secret('snowflake_creds', 'my_password'),
        schema '<schema-name>',
        sfWarehouse '<warehouse-name>'
      );
      
    • Revert [SPARK-41848][CORE] Fixing task over-scheduled with TaskResourceProfile

    • [SPARK-42162] Introduce MultiCommutativeOp expression as a memory optimization for canonicalizing large trees of commutative expressions

    • [SPARK-41990][SQL] Use FieldReference.column instead of apply in V1 to V2 filter conversion

    • [SPARK-42346][SQL] Rewrite distinct aggregates after subquery merge

    • Operating system security updates.

  • February 16, 2023

    • Users can now read and write certain Delta tables that require Reader version 3 and Writer version 7, by using Databricks Runtime 9.1 or later. To succeed, table features listed in the tables’ protocol must be supported by the current version of Databricks Runtime.

    • SYNC command supports syncing recreated Hive Metastore tables. If a HMS table has been SYNCed previously to Unity Catalog but then dropped and recreated, a subsequent re-sync will work instead of throwing TABLE_ALREADY_EXISTS status code.

    • [SPARK-36173][CORE] Support getting CPU number in TaskContext

    • [SPARK-42286][SQL] Fallback to previous codegen code path for complex expr with CAST

    • [SPARK-41848][CORE] Fixing task over-scheduled with TaskResourceProfile

    • [SPARK-41219][SQL] IntegralDivide use decimal(1, 0) to represent 0

  • January 25, 2023

    • [SPARK-41660][SQL] Only propagate metadata columns if they are used

    • [SPARK-41379][SS][PYTHON] Provide cloned spark session in DataFrame in user function for foreachBatch sink in PySpark

    • [SPARK-41669][SQL] Early pruning in canCollapseExpressions

    • Operating system security updates.

  • January 18, 2023

    • REFRESH FUNCTION SQL command now supports SQL functions and SQL Table functions. For example, the command could be used to refresh a persistent SQL function that was updated in another SQL session.

    • Java Database Connectivity (JDBC) data source v1 now supports LIMIT clause pushdown to improve performance in queries. This feature is enabled by default and can be disabled with spark.databricks.optimizer.jdbcDSv1LimitPushdown.enabled set to false.

    • In Legacy Table ACLs clusters, creating functions that reference JVM classes now requires the MODIFY_CLASSPATH privilege.

    • Java Database Connectivity (JDBC) data source v1 now supports LIMIT clause pushdown to improve performance in queries. This feature is enabled by default and can be disabled with spark.databricks.optimizer.jdbcDSv1LimitPushdown.enabled set to false.

    • Azure Synapse connector now returns a more descriptive error message when a column name contains invalid characters such as whitespaces or semicolons. In such cases, the following message will be returned: Azure Synapse Analytics failed to execute the JDBC query produced by the connector. Make sure column names do not include any invalid characters such as ';' or whitespace.

    • Spark structured streaming now works with format(“deltasharing”) on a delta sharing table as a source.

    • [SPARK-38277][SS] Clear write batch after RocksDB state store’s commit

    • [SPARK-41733][SQL][SS] Apply tree-pattern based pruning for the rule ResolveWindowTime

    • [SPARK-39591][SS] Async Progress Tracking

    • [SPARK-41339][SQL] Close and recreate RocksDB write batch instead of just clearing

    • [SPARK-41198][SS] Fix metrics in streaming query having CTE and DSv1 streaming source

    • [SPARK-41539][SQL] Remap stats and constraints against output in logical plan for LogicalRDD

    • [SPARK-41732][SQL][SS] Apply tree-pattern based pruning for the rule SessionWindowing

    • [SPARK-41862][SQL] Fix correctness bug related to DEFAULT values in Orc reader

    • [SPARK-41199][SS] Fix metrics issue when DSv1 streaming source and DSv2 streaming source are co-used

    • [SPARK-41261][PYTHON][SS] Fix issue for applyInPandasWithState when the columns of grouping keys are not placed in order from earliest

    • Operating system security updates.

  • May 17, 2023

    • Parquet scans are now robust against OOMs when scanning exceptionally structured files by dynamically adjusting batch size. File metadata is analyzed to preemptively lower batch size and is lowered again on task retries as a final safety net.

    • Fixed a regression that caused Databricks jobs to persist after failing to connect to the metastore during cluster initialization.

    • [SPARK-41520][SQL] Split AND_OR tree pattern to separate AND and OR.

    • [SPARK-43190][SQL] ListQuery.childOutput is now consistent with secondary output.

    • Operating system security updates.

  • April 25, 2023

    • If a Parquet file was read with just the failOnUnknownFields option or with Auto Loader in the failOnNewColumns schema evolution mode, columns that had different data types would be read as null instead of throwing an error stating that the file cannot be read. These reads now fail and recommend users to use the rescuedDataColumn option.

    • Auto Loader now correctly reads and no longer rescues Integer, Short, Byte types if one of these data types are provided. The Parquet file suggests one of the other two types. When the rescued data column was previously enabled, the data type mismatch would cause columns to be rescued even though they were readable.

    • [SPARK-42937][SQL] PlanSubqueries now sets InSubqueryExec#shouldBroadcast to true.

    • Operating system security updates.

  • April 11, 2023

    • Support legacy data source formats in SYNC command.

    • Fixes a bug in the %autoreload behavior in notebooks which are outside of a repo.

    • Fixed a bug where Auto Loader schema evolution can go into an infinite fail loop, when a new column is detected in the schema of a nested JSON object.

    • [SPARK-42928][SQL] Make resolvePersistentFunction synchronized.

    • [SPARK-42967][CORE] Fix SparkListenerTaskStart.stageAttemptId when a task is started after the stage is cancelled.

  • March 29, 2023

    • [SPARK-42794][SS] Increase the lockAcquireTimeoutMs to 2 minutes for acquiring the RocksDB state store in Structure Streaming

    • [SPARK-42403][CORE] JsonProtocol should handle null JSON strings

    • [SPARK-42668][SS] Catch exception while trying to close compressed stream in HDFSStateStoreProvider abort

    • Operating system security updates.

  • March 14, 2023

    • [SPARK-42635][SQL] Fix the TimestampAdd expression.

    • [SPARK-41793][SQL] Incorrect result for window frames defined by a range clause on large decimals

    • [SPARK-42484] [SQL] UnsafeRowUtils better error message

    • [SPARK-42534][SQL] Fix DB2Dialect Limit clause

    • [SPARK-41162][SQL] Fix anti- and semi-join for self-join with aggregations

    • [SPARK-42516][SQL] Always capture the session time zone config while creating views

    • Miscellaneous bug fixes.

  • February 28, 2023

    • Standardized Connection Options for Query Federation

      You can now use a unified set of options (host, port, database, user, password) for connecting to the data sources supported in Query Federation (PostgreSQL, MySQL, Synapse, Snowflake, Redshift, SQL Server). Note that port is optional and uses the default port number for each data source if not provided.

      Example of PostgreSQL connection configuration

      CREATE TABLE postgresql_table
      USING postgresql
      OPTIONS (
        dbtable '<table-name>',
        host '<host-name>',
        database '<database-name>',
        user '<user>',
        password secret('scope', 'key')
      );
      

      Example of Snowflake connection configuration

      CREATE TABLE snowflake_table
      USING snowflake
      OPTIONS (
        dbtable '<table-name>',
        host '<host-name>',
        port '<port-number>',
        database '<database-name>',
        user secret('snowflake_creds', 'my_username'),
        password secret('snowflake_creds', 'my_password'),
        schema '<schema-name>',
        sfWarehouse '<warehouse-name>'
      );
      
    • [SPARK-42286][SQL] Fallback to previous codegen code path for complex expr with CAST

    • [SPARK-41989][PYTHON] Avoid breaking logging config from pyspark.pandas

    • [SPARK-42346][SQL] Rewrite distinct aggregates after subquery merge

    • [SPARK-41360][CORE] Avoid BlockManager re-registration if the executor has been lost

    • [SPARK-42162] Introduce MultiCommutativeOp expression as a memory optimization for canonicalizing large trees of commutative expressions

    • [SPARK-41990][SQL] Use FieldReference.column instead of apply in V1 to V2 filter conversion

    • Operating system security updates.

  • February 16, 2023

    • Users can now read and write certain Delta tables that require Reader version 3 and Writer version 7, by using Databricks Runtime 9.1 or later. To succeed, table features listed in the tables’ protocol must be supported by the current version of Databricks Runtime.

    • SYNC command supports syncing recreated Hive Metastore tables. If a HMS table has been SYNCed previously to Unity Catalog but then dropped and recreated, a subsequent re-sync will work instead of throwing TABLE_ALREADY_EXISTS status code.

    • [SPARK-41219][SQL] IntegralDivide use decimal(1, 0) to represent 0

    • [SPARK-40382][SQL] Group distinct aggregate expressions by semantically equivalent children in RewriteDistinctAggregates

    • Operating system security updates.

  • January 25, 2023

    • [SPARK-41379][SS][PYTHON] Provide cloned spark session in DataFrame in user function for foreachBatch sink in PySpark

    • [SPARK-41660][SQL] Only propagate metadata columns if they are used

    • [SPARK-41669][SQL] Early pruning in canCollapseExpressions

    • Miscellaneous bug fixes.

  • January 18, 2023

    • REFRESH FUNCTION SQL command now supports SQL functions and SQL Table functions. For example, the command could be used to refresh a persistent SQL function that was updated in another SQL session.

    • Java Database Connectivity (JDBC) data source v1 now supports LIMIT clause pushdown to improve performance in queries. This feature is enabled by default and can be disabled with spark.databricks.optimizer.jdbcDSv1LimitPushdown.enabled set to false.

    • Java Database Connectivity (JDBC) data source v1 now supports LIMIT clause pushdown to improve performance in queries. This feature is enabled by default and can be disabled with spark.databricks.optimizer.jdbcDSv1LimitPushdown.enabled set to false.

    • Azure Synapse connector now returns a more descriptive error message when a column name contains invalid characters such as whitespaces or semicolons. In such cases, the following message will be returned: Azure Synapse Analytics failed to execute the JDBC query produced by the connector. Make sure column names do not include any invalid characters such as ';' or whitespace.

    • [SPARK-41198][SS] Fix metrics in streaming query having CTE and DSv1 streaming source

    • [SPARK-41862][SQL] Fix correctness bug related to DEFAULT values in Orc reader

    • [SPARK-41539][SQL] Remap stats and constraints against output in logical plan for LogicalRDD

    • [SPARK-39591][SS] Async Progress Tracking

    • [SPARK-41199][SS] Fix metrics issue when DSv1 streaming source and DSv2 streaming source are co-used

    • [SPARK-41261][PYTHON][SS] Fix issue for applyInPandasWithState when the columns of grouping keys are not placed in order from earliest

    • [SPARK-41339][SQL] Close and recreate RocksDB write batch instead of just clearing

    • [SPARK-41732][SQL][SS] Apply tree-pattern based pruning for the rule SessionWindowing

    • [SPARK-38277][SS] Clear write batch after RocksDB state store’s commit

    • Operating system security updates.

  • November 29, 2022

    • Users can configure leading and trailing whitespaces’ behavior when writing data using the Redshift connector. The following options have been added to control whitespace handling:

      • csvignoreleadingwhitespace, when set to true, removes leading whitespace from values during writes when tempformat is set to CSV or CSV GZIP. Whitespaces are retained when the config is set to false. By default, the value is true.

      • csvignoretrailingwhitespace, when set to true, removes trailing whitespace from values during writes when tempformat is set to CSV or CSV GZIP. Whitespaces are retained when the config is set to false. By default, the value is true.

    • Fixed a bug with JSON parsing in Auto Loader when all columns were left as strings (cloudFiles.inferColumnTypes was not set or set to false) and the JSON contained nested objects.

    • Upgrade snowflake-jdbc dependency to version 3.13.22.

    • Table types of JDBC tables are now EXTERNAL by default.

    • [SPARK-40906][SQL] Mode should copy keys before inserting into Map

    • Operating system security updates.

  • November 15, 2022

    • Table ACLs and UC Shared clusters now allow the Dataset.toJSON method from python.

    • [SPARK-40646] JSON parsing for structs, maps, and arrays has been fixed so when a part of a record does not match the schema, the rest of the record can still be parsed correctly instead of returning nulls. To opt-in for the improved behaviorset spark.sql.json.enablePartialResults to true. The flag is disabled by default to preserve the original behavior

    • [SPARK-40903][SQL] Avoid reordering decimal Add for canonicalization if data type is changed

    • [SPARK-40618][SQL] Fix bug in MergeScalarSubqueries rule with nested subqueries using reference tracking

    • [SPARK-40697][SQL] Add read-side char padding to cover external data files

    • Operating system security updates.

  • November 1, 2022

    • Fixed an issue where if a Delta table had a user-defined column named _change_type, but Change data feed was disabled on that table, data in that column would incorrectly fill with NULL values when running MERGE.

    • Fixed an issue where running MERGE and using exactly 99 columns from the source in the condition could result in java.lang.ClassCastException: org.apache.spark.sql.vectorized.ColumnarBatch cannot be cast to org.apache.spark.sql.catalyst.InternalRow.

    • Fixed an issue with Auto Loader where a file can be duplicated in the same micro-batch when allowOverwrites is enabled.

    • Upgraded Apache commons-text to 1.10.0.

    • [SPARK-38881][DSTREAMS][KINESIS][PYSPARK] Added Support for CloudWatch MetricsLevel Config

    • [SPARK-40596][CORE] Populate ExecutorDecommission with messages in ExecutorDecommissionInfo

    • [SPARK-40670][SS][PYTHON] Fix NPE in applyInPandasWithState when the input schema has “non-nullable” column(s)

    • Operating system security updates.

Databricks Runtime 11.2 (unsupported)

See Databricks Runtime 11.2 (unsupported).

  • February 28, 2023

    • [SPARK-42286][SQL] Fallback to previous codegen code path for complex expr with CAST

    • [SPARK-42346][SQL] Rewrite distinct aggregates after subquery merge

    • Operating system security updates.

  • February 16, 2023

    • Users can now read and write certain Delta tables that require Reader version 3 and Writer version 7, by using Databricks Runtime 9.1 or later. To succeed, table features listed in the tables’ protocol must be supported by the current version of Databricks Runtime.

    • SYNC command supports syncing recreated Hive Metastore tables. If a HMS table has been SYNCed previously to Unity Catalog but then dropped and recreated, a subsequent re-sync will work instead of throwing TABLE_ALREADY_EXISTS status code.

    • [SPARK-41219][SQL] IntegralDivide use decimal(1, 0) to represent 0

    • Operating system security updates.

  • January 31, 2023

    • Table types of JDBC tables are now EXTERNAL by default.

    • [SPARK-41379][SS][PYTHON] Provide cloned spark session in DataFrame in user function for foreachBatch sink in PySpark

  • January 18, 2023

    • Azure Synapse connector now returns a more descriptive error message when a column name contains invalid characters such as whitespaces or semicolons. In such cases, the following message will be returned: Azure Synapse Analytics failed to execute the JDBC query produced by the connector. Make sure column names do not include any invalid characters such as ';' or whitespace.

    • [SPARK-41198][SS] Fix metrics in streaming query having CTE and DSv1 streaming source

    • [SPARK-41862][SQL] Fix correctness bug related to DEFAULT values in Orc reader

    • [SPARK-41539][SQL] Remap stats and constraints against output in logical plan for LogicalRDD

    • [SPARK-41199][SS] Fix metrics issue when DSv1 streaming source and DSv2 streaming source are co-used

    • [SPARK-41339][SQL] Close and recreate RocksDB write batch instead of just clearing

    • [SPARK-41732][SQL][SS] Apply tree-pattern based pruning for the rule SessionWindowing

    • [SPARK-38277][SS] Clear write batch after RocksDB state store’s commit

    • Operating system security updates.

  • November 29, 2022

    • Users can configure leading and trailing whitespaces’ behavior when writing data using the Redshift connector. The following options have been added to control whitespace handling:

      • csvignoreleadingwhitespace, when set to true, removes leading whitespace from values during writes when tempformat is set to CSV or CSV GZIP. Whitespaces are retained when the config is set to false. By default, the value is true.

      • csvignoretrailingwhitespace, when set to true, removes trailing whitespace from values during writes when tempformat is set to CSV or CSV GZIP. Whitespaces are retained when the config is set to false. By default, the value is true.

    • Fixed a bug with JSON parsing in Auto Loader when all columns were left as strings (cloudFiles.inferColumnTypes was not set or set to false) and the JSON contained nested objects.

    • [SPARK-40906][SQL] Mode should copy keys before inserting into Map

    • Operating system security updates.

  • November 15, 2022

    • [SPARK-40646] JSON parsing for structs, maps, and arrays has been fixed so when a part of a record does not match the schema, the rest of the record can still be parsed correctly instead of returning nulls. To opt-in for the improved behavior, set spark.sql.json.enablePartialResults to true. The flag is disabled by default to preserve the original behavior

    • [SPARK-40618][SQL] Fix bug in MergeScalarSubqueries rule with nested subqueries using reference tracking

    • [SPARK-40697][SQL] Add read-side char padding to cover external data files

    • Operating system security updates.

  • November 1, 2022

    • Upgraded Apache commons-text to 1.10.0.

    • Fixed an issue where if a Delta table had a user-defined column named _change_type, but Change data feed was disabled on that table, data in that column would incorrectly fill with NULL values when running MERGE.

    • Fixed an issue where running MERGE and using exactly 99 columns from the source in the condition could result in java.lang.ClassCastException: org.apache.spark.sql.vectorized.ColumnarBatch cannot be cast to org.apache.spark.sql.catalyst.InternalRow.

    • Fixed an issue with Auto Loader where a file can be duplicated in the same micro-batch when allowOverwrites is enabled

    • [SPARK-40596][CORE] Populate ExecutorDecommission with messages in ExecutorDecommissionInfo

    • Operating system security updates.

  • October 19, 2022

    • Fixed an issue with COPY INTO usage with temporary credentials on Unity Catalog enabled clusters / warehouses.

    • [SPARK-40213][SQL] Support ASCII value conversion for Latin-1 characters

    • Operating system security updates.

  • October 5, 2022

    • Users can set spark.conf.set(“spark.databricks.io.listKeysWithPrefix.azure.enabled”, “true”) to re-enable native listing for Auto Loader on ADLS Gen2. Native listing was previously turned off due to performance issues, but may have led to an increase in storage costs for customers. This change was rolled out to DBR 10.4 and 9.1 in the previous maintenance update.

    • [SPARK-40315][SQL]Support url encode/decode as built-in function and tidy up url-related functions

    • [SPARK-40156][SQL]url_decode() should the return an error class

    • [SPARK-40169] Don’t pushdown Parquet filters with no reference to data schema

    • [SPARK-40460][SS] Fix streaming metrics when selecting _metadata

    • [SPARK-40468][SQL] Fix column pruning in CSV when corruptrecord is selected

    • [SPARK-40055][SQL] listCatalogs should also return spark_catalog even when spark_catalog implementation is defaultSessionCatalog

    • Operating system security updates.

  • September 22, 2022

    • [SPARK-40315][SQL] Add hashCode() for Literal of ArrayBasedMapData

    • [SPARK-40389][SQL] Decimals can’t upcast as integral types if the cast can overflow

    • [SPARK-40380][SQL] Fix constant-folding of InvokeLike to avoid non-serializable literal embedded in the plan

    • [SPARK-40066][SQL][FOLLOW-UP] Check if ElementAt is resolved before getting its dataType

    • [SPARK-40109][SQL] New SQL function: get()

    • [SPARK-40066][SQL] ANSI mode: always return null on invalid access to map column

    • [SPARK-40089][SQL] Fix sorting for some Decimal types

    • [SPARK-39887][SQL] RemoveRedundantAliases should keep aliases that make the output of projection nodes unique

    • [SPARK-40152][SQL] Fix split_part codegen compilation issue

    • [SPARK-40235][CORE] Use interruptible lock instead of synchronized in Executor.updateDependencies()

    • [SPARK-40212][SQL] SparkSQL castPartValue does not properly handle byte, short, or float

    • [SPARK-40218][SQL] GROUPING SETS should preserve the grouping columns

    • [SPARK-35542][ML] Fix: Bucketizer created for multiple columns with parameters

    • [SPARK-40079] Add Imputer inputCols validation for empty input case

    • [SPARK-39912]SPARK-39828[SQL] Refine CatalogImpl

Databricks Runtime 11.1 (unsupported)

See Databricks Runtime 11.1 (unsupported).

  • January 31, 2023

    • [SPARK-41379][SS][PYTHON] Provide cloned spark session in DataFrame in user function for foreachBatch sink in PySpark

    • Miscellaneous bug fixes.

  • January 18, 2023

    • Azure Synapse connector now returns a more descriptive error message when a column name contains invalid characters such as whitespaces or semicolons. In such cases, the following message will be returned: Azure Synapse Analytics failed to execute the JDBC query produced by the connector. Make sure column names do not include any invalid characters such as ';' or whitespace.

    • [SPARK-41198][SS] Fix metrics in streaming query having CTE and DSv1 streaming source

    • [SPARK-41862][SQL] Fix correctness bug related to DEFAULT values in Orc reader

    • [SPARK-41199][SS] Fix metrics issue when DSv1 streaming source and DSv2 streaming source are co-used

    • [SPARK-41339][SQL] Close and recreate RocksDB write batch instead of just clearing

    • [SPARK-41732][SQL][SS] Apply tree-pattern based pruning for the rule SessionWindowing

    • [SPARK-38277][SS] Clear write batch after RocksDB state store’s commit

    • Operating system security updates.

  • November 29, 2022

    • Users can configure leading and trailing whitespaces’ behavior when writing data using the Redshift connector. The following options have been added to control whitespace handling:

      • csvignoreleadingwhitespace, when set to true, removes leading whitespace from values during writes when tempformat is set to CSV or CSV GZIP. Whitespaces are retained when the config is set to false. By default, the value is true.

      • csvignoretrailingwhitespace, when set to true, removes trailing whitespace from values during writes when tempformat is set to CSV or CSV GZIP. Whitespaces are retained when the config is set to false. By default, the value is true.

    • Fixed a bug with JSON parsing in Auto Loader when all columns were left as strings (cloudFiles.inferColumnTypes was not set or set to false) and the JSON contained nested objects.

    • [SPARK-39650][SS] Fix incorrect value schema in streaming deduplication with backward compatibility

    • Operating system security updates.

  • November 15, 2022

    • [SPARK-40646] JSON parsing for structs, maps, and arrays has been fixed so when a part of a record does not match the schema, the rest of record can still be parsed correctly instead of returning nulls.To opt-in for the improved behavior, set spark.sql.json.enablePartialResults to true. The flag is disabled by default to preserve the original behavior

    • Operating system security updates.

  • November 1, 2022

    • Upgraded Apache commons-text to 1.10.0.

    • Fixed an issue where if a Delta table had a user-defined column named _change_type, but Change data feed was disabled on that table, data in that column would incorrectly fill with NULL values when running MERGE.

    • Fixed an issue where running MERGE and using exactly 99 columns from the source in the condition could result in java.lang.ClassCastException: org.apache.spark.sql.vectorized.ColumnarBatch cannot be cast to org.apache.spark.sql.catalyst.InternalRow.

    • Fixed an issue with Auto Loader where a file can be duplicated in the same micro-batch when allowOverwrites is enabled

    • [SPARK-40697][SQL] Add read-side char padding to cover external data files

    • [SPARK-40596][CORE] Populate ExecutorDecommission with messages in ExecutorDecommissionInfo

    • Operating system security updates.

  • October 18, 2022

    • Fixed an issue with COPY INTO usage with temporary credentials on Unity Catalog enabled clusters / warehouses.

    • [SPARK-40213][SQL] Support ASCII value conversion for Latin-1 characters

    • Operating system security updates.

  • October 5, 2022

    • Users can set spark.conf.set(“spark.databricks.io.listKeysWithPrefix.azure.enabled”, “true”) to re-enable native listing for Auto Loader on ADLS Gen2. Native listing was previously turned off due to performance issues, but may have led to an increase in storage costs for customers. This change was rolled out to DBR 10.4 and 9.1 in the previous maintenance update.

    • [SPARK-40169] Don’t pushdown Parquet filters with no reference to data schema

    • [SPARK-40460][SS] Fix streaming metrics when selecting _metadata

    • [SPARK-40468][SQL] Fix column pruning in CSV when corruptrecord is selected

    • [SPARK-40055][SQL] listCatalogs should also return spark_catalog even when spark_catalog implementation is defaultSessionCatalog

    • Operating system security updates.

  • September 22, 2022

    • [SPARK-40315][SQL] Add hashCode() for Literal of ArrayBasedMapData

    • [SPARK-40380][SQL] Fix constant-folding of InvokeLike to avoid non-serializable literal embedded in the plan

    • [SPARK-40089][SQL] Fix sorting for some Decimal types

    • [SPARK-39887][SQL] RemoveRedundantAliases should keep aliases that make the output of projection nodes unique

    • [SPARK-40152][SQL] Fix split_part codegen compilation issue

  • September 6, 2022

    • We have updated the permission model in Table Access Controls (Table ACLs) so that only MODIFY permissions are needed to change a table’s schema or table properties with ALTER TABLE. Previously, these operations required a user to own the table. Ownership is still required to grant permissions on a table, change its owner, change its location, or rename it. This change makes the permission model for Table ACLs more consistent with Unity Catalog.

    • [SPARK-40235][CORE] Use interruptible lock instead of synchronized in Executor.updateDependencies()

    • [SPARK-40212][SQL] SparkSQL castPartValue does not properly handle byte, short, or float

    • [SPARK-40218][SQL] GROUPING SETS should preserve the grouping columns

    • [SPARK-39976][SQL] ArrayIntersect should handle null in left expression correctly

    • [SPARK-40053][CORE][SQL][TESTS] Add assume to dynamic cancel cases which requiring Python runtime environment

    • [SPARK-35542][CORE][ML] Fix: Bucketizer created for multiple columns with parameters splitsArray, inputCols and outputCols can not be loaded after saving it

    • [SPARK-40079][CORE] Add Imputer inputCols validation for empty input case

  • August 24, 2022

    • Shares, providers, and recipients now support SQL commands to change owners, comment, rename

    • [SPARK-39983][CORE][SQL] Do not cache unserialized broadcast relations on the driver

    • [SPARK-39912][SPARK-39828][SQL] Refine CatalogImpl

    • [SPARK-39775][CORE][AVRO] Disable validate default values when parsing Avro schemas

    • [SPARK-39806] Fixed the issue on queries accessing METADATA struct crash on partitioned tables

    • [SPARK-39867][SQL] Global limit should not inherit OrderPreservingUnaryNode

    • [SPARK-39962][PYTHON][SQL] Apply projection when group attributes are empty

    • [SPARK-39839][SQL] Handle special case of null variable-length Decimal with non-zero offsetAndSize in UnsafeRow structural integrity check

    • [SPARK-39713][SQL] ANSI mode: add suggestion of using try_element_at for INVALID_ARRAY_INDEX error

    • [SPARK-39847][SS] Fix race condition in RocksDBLoader.loadLibrary() if caller thread is interrupted

    • [SPARK-39731][SQL] Fix issue in CSV and JSON data sources when parsing dates in “yyyyMMdd” format with CORRECTED time parser policy

    • Operating system security updates.

  • August 10, 2022

    • For Delta tables with table access control, automatic schema evolution through DML statements such as INSERT and MERGE is now available for all users who have MODIFY permissions on such tables. Additionally, permissions required to perform schema evolution with COPY INTO are now lowered from OWNER to MODIFY for consistency with other commands. These changes make the table ACL security model more consistent with other operations such as replacing a table.

    • [SPARK-39889] Enhance the error message of division by 0

    • [SPARK-39795] [SQL] New SQL function: try_to_timestamp

    • [SPARK-39749] Always use plain string representation on casting decimal as string under ANSI mode

    • [SPARK-39625] Rename df.as to df.to

    • [SPARK-39787] [SQL] Use error class in the parsing error of function to_timestamp

    • [SPARK-39625] [SQL] Add Dataset.as(StructType)

    • [SPARK-39689] Support 2-chars lineSep in CSV datasource

    • [SPARK-39579] [SQL][PYTHON][R] Make ListFunctions/getFunction/functionExists compatible with 3 layer namespace

    • [SPARK-39702] [CORE] Reduce memory overhead of TransportCipher$EncryptedMessage by using a shared byteRawChannel

    • [SPARK-39575] [AVRO] add ByteBuffer#rewind after ByteBuffer#get in AvroDeserializer

    • [SPARK-39265] [SQL] Fix test failure when SPARK_ANSI_SQL_MODE is enabled

    • [SPARK-39441] [SQL] Speed up DeduplicateRelations

    • [SPARK-39497] [SQL] Improve the analysis exception of missing map key column

    • [SPARK-39476] [SQL] Disable Unwrap cast optimize when casting from Long to Float/ Double or from Integer to Float

    • [SPARK-39434] [SQL] Provide runtime error query context when array index is out of bounding

Databricks Runtime 11.0 (unsupported)

See Databricks Runtime 11.0 (unsupported).

  • November 29, 2022

    • Users can configure leading and trailing whitespaces’ behavior when writing data using the Redshift connector. The following options have been added to control whitespace handling:

      • csvignoreleadingwhitespace, when set to true, removes leading whitespace from values during writes when tempformat is set to CSV or CSV GZIP. Whitespaces are retained when the config is set to false. By default, the value is true.

      • csvignoretrailingwhitespace, when set to true, removes trailing whitespace from values during writes when tempformat is set to CSV or CSV GZIP. Whitespaces are retained when the config is set to false. By default, the value is true.

    • Fixed a bug with JSON parsing in Auto Loader when all columns were left as strings (cloudFiles.inferColumnTypes was not set or set to false) and the JSON contained nested objects.

    • [SPARK-39650][SS] Fix incorrect value schema in streaming deduplication with backward compatibility

    • Operating system security updates.

  • November 15, 2022

    • [SPARK-40646] JSON parsing for structs, maps, and arrays has been fixed so when a part of a record does not match the schema, the rest of the record can still be parsed correctly instead of returning nulls. To opt-in for the improved behavior, set spark.sql.json.enablePartialResults to true. The flag is disabled by default to preserve the original behavior.

  • November 1, 2022

    • Upgraded Apache commons-text to 1.10.0.

    • Fixed an issue where if a Delta table had a user-defined column named _change_type, but Change data feed was disabled on that table, data in that column would incorrectly fill with NULL values when running MERGE.

    • Fixed an issue with Auto Loader where a file can be duplicated in the same micro-batch when allowOverwrites is enabled

    • [SPARK-40697][SQL] Add read-side char padding to cover external data files

    • [SPARK-40596][CORE] Populate ExecutorDecommission with messages in ExecutorDecommissionInfo

    • Operating system security updates.

  • October 18, 2022

    • [SPARK-40213][SQL] Support ASCII value conversion for Latin-1 characters

    • Operating system security updates.

  • October 5, 2022

    • Users can set spark.conf.set(“spark.databricks.io.listKeysWithPrefix.azure.enabled”, “true”) to re-enable native listing for Auto Loader on ADLS Gen2. Native listing was previously turned off due to performance issues, but may have led to an increase in storage costs for customers. This change was rolled out to DBR 10.4 and 9.1 in the previous maintenance update.

    • [SPARK-40169] Don’t pushdown Parquet filters with no reference to data schema

    • [SPARK-40460][SS] Fix streaming metrics when selecting _metadata

    • [SPARK-40468][SQL] Fix column pruning in CSV when corruptrecord is selected

    • Operating system security updates.

  • September 22, 2022

    • [SPARK-40315][SQL] Add hashCode() for Literal of ArrayBasedMapData

    • [SPARK-40380][SQL] Fix constant-folding of InvokeLike to avoid non-serializable literal embedded in the plan

    • [SPARK-40089][SQL] Fix sorting for some Decimal types

    • [SPARK-39887][SQL] RemoveRedundantAliases should keep aliases that make the output of projection nodes unique

    • [SPARK-40152][SQL] Fix split_part codegen compilation issue

  • September 6, 2022

    • [SPARK-40235][CORE] Use interruptible lock instead of synchronized in Executor.updateDependencies()

    • [SPARK-40212][SQL] SparkSQL castPartValue does not properly handle byte, short, or float

    • [SPARK-40218][SQL] GROUPING SETS should preserve the grouping columns

    • [SPARK-39976][SQL] ArrayIntersect should handle null in left expression correctly

    • [SPARK-40053][CORE][SQL][TESTS] Add assume to dynamic cancel cases which requiring Python runtime environment

    • [SPARK-35542][CORE][ML] Fix: Bucketizer created for multiple columns with parameters splitsArray, inputCols and outputCols can not be loaded after saving it

    • [SPARK-40079][CORE] Add Imputer inputCols validation for empty input case

  • August 24, 2022

    • [SPARK-39983][CORE][SQL] Do not cache unserialized broadcast relations on the driver

    • [SPARK-39775][CORE][AVRO] Disable validate default values when parsing Avro schemas

    • [SPARK-39806] Fixed the issue on queries accessing METADATA struct crash on partitioned tables

    • [SPARK-39867][SQL] Global limit should not inherit OrderPreservingUnaryNode

    • [SPARK-39962][PYTHON][SQL] Apply projection when group attributes are empty

    • Operating system security updates.

  • August 9, 2022

    • [SPARK-39713][SQL] ANSI mode: add suggestion of using try_element_at for INVALID_ARRAY_INDEX error

    • [SPARK-39847] Fix race condition in RocksDBLoader.loadLibrary() if caller thread is interrupted

    • [SPARK-39731][SQL] Fix issue in CSV and JSON data sources when parsing dates in “yyyyMMdd” format with CORRECTED time parser policy

    • [SPARK-39889] Enhance the error message of division by 0

    • [SPARK-39795][SQL] New SQL function: try_to_timestamp

    • [SPARK-39749] Always use plain string representation on casting decimal as string under ANSI mode

    • [SPARK-39625][SQL] Add Dataset.to(StructType)

    • [SPARK-39787][SQL] Use error class in the parsing error of function to_timestamp

    • Operating system security updates.

  • July 27, 2022

    • [SPARK-39689]Support 2-chars lineSep in CSV datasource

    • [SPARK-39104][SQL] InMemoryRelation#isCachedColumnBuffersLoaded should be thread-safe

    • [SPARK-39702][CORE] Reduce memory overhead of TransportCipher$EncryptedMessage by using a shared byteRawChannel

    • [SPARK-39575][AVRO] add ByteBuffer#rewind after ByteBuffer#get in AvroDeserializer

    • [SPARK-39497][SQL] Improve the analysis exception of missing map key column

    • [SPARK-39441][SQL] Speed up DeduplicateRelations

    • [SPARK-39476][SQL] Disable Unwrap cast optimize when casting from Long to Float/ Double or from Integer to Float

    • [SPARK-39434][SQL] Provide runtime error query context when array index is out of bounding

    • [SPARK-39570][SQL] Inline table should allow expressions with alias

    • Operating system security updates.

  • July 13, 2022

    • Make Delta MERGE operation results consistent when source is non-deterministic.

    • Fixed an issue for the cloud_files_state TVF when running on non-DBFS paths.

    • Disabled Auto Loader’s use of native cloud APIs for directory listing on Azure.

    • [SPARK-38796][SQL] Update to_number and try_to_number functions to allow PR with positive numbers

    • [SPARK-39272][SQL] Increase the start position of query context by 1

    • [SPARK-39419][SQL] Fix ArraySort to throw an exception when the comparator returns null

    • Operating system security updates.

  • July 5, 2022

    • Improvement on error messages for a range of error classes.

    • [SPARK-39451][SQL] Support casting intervals to integrals in ANSI mode

    • [SPARK-39361] Don’t use Log4J2’s extended throwable conversion pattern in default logging configurations

    • [SPARK-39354][SQL] Ensure show Table or view not found even if there are dataTypeMismatchError related to Filter at the same time

    • [SPARK-38675][CORE] Fix race during unlock in BlockInfoManager

    • [SPARK-39392][SQL] Refine ANSI error messages for try_* function hints

    • [SPARK-39214][SQL][3.3] Improve errors related to CAST

    • [SPARK-37939][SQL] Use error classes in the parsing errors of properties

    • [SPARK-39085][SQL] Move the error message of INCONSISTENT_BEHAVIOR_CROSS_VERSION to error-classes.json

    • [SPARK-39376][SQL] Hide duplicated columns in star expansion of subquery alias from NATURAL/USING JOIN

    • [SPARK-39283][CORE] Fix deadlock between TaskMemoryManager and UnsafeExternalSorter.SpillableIterator

    • [SPARK-39285][SQL] Spark should not check field names when reading files

    • Operating system security updates.

Databricks Runtime 10.5 (unsupported)

See Databricks Runtime 10.5 (unsupported).

  • November 1, 2022

    • Fixed an issue where if a Delta table had a user-defined column named _change_type, but Change data feed was disabled on that table, data in that column would incorrectly fill with NULL values when running MERGE.

    • [SPARK-40697][SQL] Add read-side char padding to cover external data files

    • [SPARK-40596][CORE] Populate ExecutorDecommission with messages in ExecutorDecommissionInfo

    • Operating system security updates.

  • October 18, 2022

    • Operating system security updates.

  • October 5, 2022

    • Users can set spark.conf.set(“spark.databricks.io.listKeysWithPrefix.azure.enabled”, “true”) to re-enable native listing for Auto Loader on ADLS Gen2. Native listing was previously turned off due to performance issues, but may have led to an increase in storage costs for customers. This change was rolled out to DBR 10.4 and 9.1 in the previous maintenance update.

    • reload4j has been upgraded to 1.2.19 to fix vulnerabilities.

    • [SPARK-40460][SS] Fix streaming metrics when selecting _metadata

    • [SPARK-40468][SQL] Fix column pruning in CSV when corruptrecord is selected

    • Operating system security updates.

  • September 22, 2022

    • [SPARK-40315][SQL] Add hashCode() for Literal of ArrayBasedMapData

    • [SPARK-40213][SQL] Support ASCII value conversion for Latin-1 characters

    • [SPARK-40380][SQL] Fix constant-folding of InvokeLike to avoid non-serializable literal embedded in the plan

    • [SPARK-38404][SQL] Improve CTE resolution when a nested CTE references an outer CTE

    • [SPARK-40089][SQL] Fix sorting for some Decimal types

    • [SPARK-39887][SQL] RemoveRedundantAliases should keep aliases that make the output of projection nodes unique

    • Operating system security updates.

  • September 6, 2022

    • [SPARK-40235][CORE] Use interruptible lock instead of synchronized in Executor.updateDependencies()

    • [SPARK-39976][SQL] ArrayIntersect should handle null in left expression correctly

    • [SPARK-40053][CORE][SQL][TESTS] Add assume to dynamic cancel cases which requiring Python runtime environment

    • [SPARK-35542][CORE][ML] Fix: Bucketizer created for multiple columns with parameters splitsArray, inputCols and outputCols can not be loaded after saving it

    • [SPARK-40079][CORE] Add Imputer inputCols validation for empty input case

  • August 24, 2022

    • [SPARK-39983][CORE][SQL] Do not cache unserialized broadcast relations on the driver

    • [SPARK-39775][CORE][AVRO] Disable validate default values when parsing Avro schemas

    • [SPARK-39806] Fixed the issue on queries accessing METADATA struct crash on partitioned tables

    • [SPARK-39962][PYTHON][SQL] Apply projection when group attributes are empty

    • [SPARK-37643][SQL] when charVarcharAsString is true, for char datatype predicate query should skip rpadding rule

    • Operating system security updates.

  • August 9, 2022

    • [SPARK-39847] Fix race condition in RocksDBLoader.loadLibrary() if caller thread is interrupted

    • [SPARK-39731][SQL] Fix issue in CSV and JSON data sources when parsing dates in “yyyyMMdd” format with CORRECTED time parser policy

    • Operating system security updates.

  • July 27, 2022

    • [SPARK-39625][SQL] Add Dataset.as(StructType)

    • [SPARK-39689]Support 2-chars lineSep in CSV datasource

    • [SPARK-39104][SQL] InMemoryRelation#isCachedColumnBuffersLoaded should be thread-safe

    • [SPARK-39570][SQL] Inline table should allow expressions with alias

    • [SPARK-39702][CORE] Reduce memory overhead of TransportCipher$EncryptedMessage by using a shared byteRawChannel

    • [SPARK-39575][AVRO] add ByteBuffer#rewind after ByteBuffer#get in AvroDeserializer

    • [SPARK-39476][SQL] Disable Unwrap cast optimize when casting from Long to Float/ Double or from Integer to Float

    • Operating system security updates.

  • July 13, 2022

    • Make Delta MERGE operation results consistent when source is non-deterministic.

    • [SPARK-39355][SQL] Single column uses quoted to construct UnresolvedAttribute

    • [SPARK-39548][SQL] CreateView Command with a window clause query hit a wrong window definition not found issue

    • [SPARK-39419][SQL] Fix ArraySort to throw an exception when the comparator returns null

    • Disabled Auto Loader’s use of native cloud APIs for directory listing on Azure.

    • Operating system security updates.

  • July 5, 2022

    • [SPARK-39376][SQL] Hide duplicated columns in star expansion of subquery alias from NATURAL/USING JOIN

    • Operating system security updates.

  • June 15, 2022

    • [SPARK-39283][CORE] Fix deadlock between TaskMemoryManager and UnsafeExternalSorter.SpillableIterator

    • [SPARK-39285][SQL] Spark should not check field names when reading files

    • [SPARK-34096][SQL] Improve performance for nth_value ignore nulls over offset window

    • [SPARK-36718][SQL][FOLLOWUP] Fix the isExtractOnly check in CollapseProject

  • June 2, 2022

    • [SPARK-39166][SQL] Provide runtime error query context for binary arithmetic when WSCG is off

    • [SPARK-39093][SQL] Avoid codegen compilation error when dividing year-month intervals or day-time intervals by an integral

    • [SPARK-38990][SQL] Avoid NullPointerException when evaluating date_trunc/trunc format as a bound reference

    • Operating system security updates.

  • May 18, 2022

    • Fixes a potential native memory leak in Auto Loader.

    • [SPARK-38868][SQL]Don’t propagate exceptions from filter predicate when optimizing outer joins

    • [SPARK-38796][SQL] Implement the to_number and try_to_number SQL functions according to a new specification

    • [SPARK-38918][SQL] Nested column pruning should filter out attributes that do not belong to the current relation

    • [SPARK-38929][SQL] Improve error messages for cast failures in ANSI

    • [SPARK-38926][SQL] Output types in error messages in SQL style

    • [SPARK-39084][PYSPARK] Fix df.rdd.isEmpty() by using TaskContext to stop iterator on task completion

    • [SPARK-32268][SQL] Add ColumnPruning in injectBloomFilter

    • [SPARK-38908][SQL] Provide query context in runtime error of Casting from String to Number/Date/Timestamp/Boolean

    • [SPARK-39046][SQL] Return an empty context string if TreeNode.origin is wrongly set

    • [SPARK-38974][SQL] Filter registered functions with a given database name in list functions

    • [SPARK-38762][SQL] Provide query context in Decimal overflow errors

    • [SPARK-38931][SS] Create root dfs directory for RocksDBFileManager with unknown number of keys on 1st checkpoint

    • [SPARK-38992][CORE] Avoid using bash -c in ShellBasedGroupsMappingProvider

    • [SPARK-38716][SQL] Provide query context in map key not exists error

    • [SPARK-38889][SQL] Compile boolean column filters to use the bit type for MSSQL data source

    • [SPARK-38698][SQL] Provide query context in runtime error of Divide/Div/Reminder/Pmod

    • [SPARK-38823][SQL] Make NewInstance non-foldable to fix aggregation buffer corruption issue

    • [SPARK-38809][SS] Implement option to skip null values in symmetric hash implementation of stream-stream joins

    • [SPARK-38676][SQL] Provide SQL query context in runtime error message of Add/Subtract/Multiply

    • [SPARK-38677][PYSPARK] Python MonitorThread should detect deadlock due to blocking I/O

    • Operating system security updates.

Databricks Runtime 10.3 (Unsupported)

See Databricks Runtime 10.3 (unsupported).

  • July 27, 2022

    • [SPARK-39689]Support 2-chars lineSep in CSV datasource

    • [SPARK-39104][SQL] InMemoryRelation#isCachedColumnBuffersLoaded should be thread-safe

    • [SPARK-39702][CORE] Reduce memory overhead of TransportCipher$EncryptedMessage by using a shared byteRawChannel

    • Operating system security updates.

  • July 20, 2022

    • Make Delta MERGE operation results consistent when source is non-deterministic.

    • [SPARK-39476][SQL] Disable Unwrap cast optimize when casting from Long to Float/ Double or from Integer to Float

    • [SPARK-39548][SQL] CreateView Command with a window clause query hit a wrong window definition not found issue

    • [SPARK-39419][SQL] Fix ArraySort to throw an exception when the comparator returns null

    • Operating system security updates.

  • July 5, 2022

    • [SPARK-39376][SQL] Hide duplicated columns in star expansion of subquery alias from NATURAL/USING JOIN

    • Operating system security updates.

  • June 15, 2022

    • [SPARK-39283][CORE] Fix deadlock between TaskMemoryManager and UnsafeExternalSorter.SpillableIterator

    • [SPARK-39285][SQL] Spark should not check field names when reading files

    • [SPARK-34096][SQL] Improve performance for nth_value ignore nulls over offset window

    • [SPARK-36718][SQL][FOLLOWUP] Fix the isExtractOnly check in CollapseProject

  • June 2, 2022

    • [SPARK-38990][SQL] Avoid NullPointerException when evaluating date_trunc/trunc format as a bound reference

    • Operating system security updates.

  • May 18, 2022

    • Fixes a potential native memory leak in Auto Loader.

    • [SPARK-38918][SQL] Nested column pruning should filter out attributes that do not belong to the current relation

    • [SPARK-37593][CORE] Reduce default page size by LONG_ARRAY_OFFSET if G1GC and ON_HEAP are used

    • [SPARK-39084][PYSPARK] Fix df.rdd.isEmpty() by using TaskContext to stop iterator on task completion

    • [SPARK-32268][SQL] Add ColumnPruning in injectBloomFilter

    • [SPARK-38974][SQL] Filter registered functions with a given database name in list functions

    • [SPARK-38889][SQL] Compile boolean column filters to use the bit type for MSSQL data source

    • Operating system security updates.

  • May 4, 2022

    • Upgraded Java AWS SDK from version 1.11.655 to 1.12.1899.

  • April 19, 2022

    • [SPARK-38616][SQL] Keep track of SQL query text in Catalyst TreeNode

    • Operating system security updates.

  • April 6, 2022

    • [SPARK-38631][CORE] Uses Java-based implementation for un-tarring at Utils.unpack

    • Operating system security updates.

  • March 22, 2022

    • Changed the current working directory of notebooks on High Concurrency clusters with either table access control or credential passthrough enabled to the user’s home directory. Previously, the working directory was /databricks/driver.

    • [SPARK-38437][SQL] Lenient serialization of datetime from datasource

    • [SPARK-38180][SQL] Allow safe up-cast expressions in correlated equality predicates

    • [SPARK-38155][SQL] Disallow distinct aggregate in lateral subqueries with unsupported predicates

    • [SPARK-38325][SQL] ANSI mode: avoid potential runtime error in HashJoin.extractKeyExprAt()

  • March 14, 2022

    • Improved transaction conflict detection for empty transactions in Delta Lake.

    • [SPARK-38185][SQL] Fix data incorrect if aggregate function is empty

    • [SPARK-38318][SQL] regression when replacing a dataset view

    • [SPARK-38236][SQL] Absolute file paths specified in create/alter table are treated as relative

    • [SPARK-35937][SQL] Extracting date field from timestamp should work in ANSI mode

    • [SPARK-34069][SQL] Kill barrier tasks should respect SPARK_JOB_INTERRUPT_ON_CANCEL

    • [SPARK-37707][SQL] Allow store assignment between TimestampNTZ and Date/Timestamp

  • February 23, 2022

    • [SPARK-27442][SQL] Remove check field name when reading/writing data in parquet

Databricks Runtime 10.2 (unsupported)

See Databricks Runtime 10.2 (unsupported).

  • June 15, 2022

    • [SPARK-39283][CORE] Fix deadlock between TaskMemoryManager and UnsafeExternalSorter.SpillableIterator

    • [SPARK-39285][SQL] Spark should not check field names when reading files

    • [SPARK-34096][SQL] Improve performance for nth_value ignore nulls over offset window

  • June 2, 2022

    • [SPARK-38918][SQL] Nested column pruning should filter out attributes that do not belong to the current relation

    • [SPARK-38990][SQL] Avoid NullPointerException when evaluating date_trunc/trunc format as a bound reference

    • Operating system security updates.

  • May 18, 2022

    • Fixes a potential native memory leak in Auto Loader.

    • [SPARK-39084][PYSPARK] Fix df.rdd.isEmpty() by using TaskContext to stop iterator on task completion

    • [SPARK-38889][SQL] Compile boolean column filters to use the bit type for MSSQL data source

    • [SPARK-38931][SS] Create root dfs directory for RocksDBFileManager with unknown number of keys on 1st checkpoint

    • Operating system security updates.

  • May 4, 2022

    • Upgraded Java AWS SDK from version 1.11.655 to 1.12.1899.

  • April 19, 2022

    • Operating system security updates.

    • Miscellaneous bug fixes.

  • April 6, 2022

    • [SPARK-38631][CORE] Uses Java-based implementation for un-tarring at Utils.unpack

    • Operating system security updates.

  • March 22, 2022

    • Changed the current working directory of notebooks on High Concurrency clusters with either table access control or credential passthrough enabled to the user’s home directory. Previously, the working directory was /databricks/driver.

    • [SPARK-38437][SQL] Lenient serialization of datetime from datasource

    • [SPARK-38180][SQL] Allow safe up-cast expressions in correlated equality predicates

    • [SPARK-38155][SQL] Disallow distinct aggregate in lateral subqueries with unsupported predicates

    • [SPARK-38325][SQL] ANSI mode: avoid potential runtime error in HashJoin.extractKeyExprAt()

  • March 14, 2022

    • Improved transaction conflict detection for empty transactions in Delta Lake.

    • [SPARK-38185][SQL] Fix data incorrect if aggregate function is empty

    • [SPARK-38318][SQL] regression when replacing a dataset view

    • [SPARK-38236][SQL] Absolute file paths specified in create/alter table are treated as relative

    • [SPARK-35937][SQL] Extracting date field from timestamp should work in ANSI mode

    • [SPARK-34069][SQL] Kill barrier tasks should respect SPARK_JOB_INTERRUPT_ON_CANCEL

    • [SPARK-37707][SQL] Allow store assignment between TimestampNTZ and Date/Timestamp

  • February 23, 2022

    • [SPARK-37577][SQL] Fix ClassCastException: ArrayType cannot be cast to StructType for Generate Pruning

  • February 8, 2022

    • [SPARK-27442][SQL] Remove check field name when reading/writing data in parquet.

    • Operating system security updates.

  • February 1, 2022

    • Operating system security updates.

  • January 26, 2022

    • Fixed a bug where concurrent transactions on Delta tables could commit in a non-serializable order under certain rare conditions.

    • Fixed a bug where the OPTIMIZE command could fail when the ANSI SQL dialect was enabled.

  • January 19, 2022

    • Introduced support for inlining temporary credentials to COPY INTO for loading the source data without requiring SQL ANY_FILE permissions

    • Bug fixes and security enhancements.

  • December 20, 2021

    • Fixed a rare bug with Parquet column index based filtering.

Databricks Runtime 10.1 (unsupported)

See Databricks Runtime 10.1 (unsupported).

  • June 15, 2022

    • [SPARK-39283][CORE] Fix deadlock between TaskMemoryManager and UnsafeExternalSorter.SpillableIterator

    • [SPARK-39285][SQL] Spark should not check field names when reading files

    • [SPARK-34096][SQL] Improve performance for nth_value ignore nulls over offset window

  • June 2, 2022

    • Operating system security updates.

  • May 18, 2022

    • Fixes a potential native memory leak in Auto Loader.

    • [SPARK-39084][PYSPARK] Fix df.rdd.isEmpty() by using TaskContext to stop iterator on task completion

    • [SPARK-38889][SQL] Compile boolean column filters to use the bit type for MSSQL data source

    • Operating system security updates.

  • April 19, 2022

    • [SPARK-37270][SQL] Fix push foldable into CaseWhen branches if elseValue is empty

    • Operating system security updates.

  • April 6, 2022

    • [SPARK-38631][CORE] Uses Java-based implementation for un-tarring at Utils.unpack

    • Operating system security updates.

  • March 22, 2022

    • [SPARK-38437][SQL] Lenient serialization of datetime from datasource

    • [SPARK-38180][SQL] Allow safe up-cast expressions in correlated equality predicates

    • [SPARK-38155][SQL] Disallow distinct aggregate in lateral subqueries with unsupported predicates

    • [SPARK-38325][SQL] ANSI mode: avoid potential runtime error in HashJoin.extractKeyExprAt()

  • March 14, 2022

    • Improved transaction conflict detection for empty transactions in Delta Lake.

    • [SPARK-38185][SQL] Fix data incorrect if aggregate function is empty

    • [SPARK-38318][SQL] regression when replacing a dataset view

    • [SPARK-38236][SQL] Absolute file paths specified in create/alter table are treated as relative

    • [SPARK-35937][SQL] Extracting date field from timestamp should work in ANSI mode

    • [SPARK-34069][SQL] Kill barrier tasks should respect SPARK_JOB_INTERRUPT_ON_CANCEL

    • [SPARK-37707][SQL] Allow store assignment between TimestampNTZ and Date/Timestamp

  • February 23, 2022

    • [SPARK-37577][SQL] Fix ClassCastException: ArrayType cannot be cast to StructType for Generate Pruning

  • February 8, 2022

    • [SPARK-27442][SQL] Remove check field name when reading/writing data in parquet.

    • Operating system security updates.

  • February 1, 2022

    • Operating system security updates.

  • January 26, 2022

    • Fixed a bug where concurrent transactions on Delta tables could commit in a non-serializable order under certain rare conditions.

    • Fixed a bug where the OPTIMIZE command could fail when the ANSI SQL dialect was enabled.

  • January 19, 2022

    • Introduced support for inlining temporary credentials to COPY INTO for loading the source data without requiring SQL ANY_FILE permissions

    • Fixed an out of memory issue with query result caching under certain conditions.

    • Fixed an issue with USE DATABASE when a user switches the current catalog to a non-default catalog.

    • Bug fixes and security enhancements.

    • Operating system security updates.

  • December 20, 2021

    • Fixed a rare bug with Parquet column index based filtering.

Databricks Runtime 10.0 (unsupported)

See Databricks Runtime 10.0 (unsupported).

  • April 19, 2022

    • [SPARK-37270][SQL] Fix push foldable into CaseWhen branches if elseValue is empty

    • Operating system security updates.

  • April 6, 2022

    • [SPARK-38631][CORE] Uses Java-based implementation for un-tarring at Utils.unpack

    • Operating system security updates.

  • March 22, 2022

    • [SPARK-38437][SQL] Lenient serialization of datetime from datasource

    • [SPARK-38180][SQL] Allow safe up-cast expressions in correlated equality predicates

    • [SPARK-38155][SQL] Disallow distinct aggregate in lateral subqueries with unsupported predicates

    • [SPARK-38325][SQL] ANSI mode: avoid potential runtime error in HashJoin.extractKeyExprAt()

  • March 14, 2022

    • Improved transaction conflict detection for empty transactions in Delta Lake.

    • [SPARK-38185][SQL] Fix data incorrect if aggregate function is empty

    • [SPARK-38318][SQL] regression when replacing a dataset view

    • [SPARK-38236][SQL] Absolute file paths specified in create/alter table are treated as relative

    • [SPARK-35937][SQL] Extracting date field from timestamp should work in ANSI mode

    • [SPARK-34069][SQL] Kill barrier tasks should respect SPARK_JOB_INTERRUPT_ON_CANCEL

    • [SPARK-37707][SQL] Allow store assignment between TimestampNTZ and Date/Timestamp

  • February 23, 2022

    • [SPARK-37577][SQL] Fix ClassCastException: ArrayType cannot be cast to StructType for Generate Pruning

  • February 8, 2022

    • [SPARK-27442][SQL] Remove check field name when reading/writing data in parquet.

    • [SPARK-36905][SQL] Fix reading hive views without explicit column names

    • [SPARK-37859][SQL] Fix issue that SQL tables created with JDBC with Spark 3.1 are not readable with 3.2

    • Operating system security updates.

  • February 1, 2022

    • Operating system security updates.

  • January 26, 2022

    • Fixed a bug where concurrent transactions on Delta tables could commit in a non-serializable order under certain rare conditions.

    • Fixed a bug where the OPTIMIZE command could fail when the ANSI SQL dialect was enabled.

  • January 19, 2022

    • Bug fixes and security enhancements.

    • Operating system security updates.

  • December 20, 2021

    • Fixed a rare bug with Parquet column index based filtering.

  • November 9, 2021

    • Introduced additional configuration flags to enable fine grained control of ANSI behaviors.

  • November 4, 2021

    • Fixed a bug that could cause Structured Streaming streams to fail with an ArrayIndexOutOfBoundsException

    • Fixed a race condition that might cause a query failure with an IOException like java.io.IOException: No FileSystem for scheme or that might cause modifications to sparkContext.hadoopConfiguration to not take effect in queries.

    • The Apache Spark Connector for Delta Sharing was upgraded to 0.2.0.

  • November 30, 2021

    • Fixed an issue with timestamp parsing where a timezone string without a colon was considered invalid.

    • Fixed an out of memory issue with query result caching under certain conditions.

    • Fixed an issue with USE DATABASE when a user switches the current catalog to a non-default catalog.

Databricks Runtime 9.0 (unsupported)

See Databricks Runtime 9.0 (unsupported).

  • February 8, 2022

    • Operating system security updates.

  • February 1, 2022

    • Operating system security updates.

  • January 26, 2022

    • Fixed a bug where the OPTIMIZE command could fail when the ANSI SQL dialect was enabled.

  • January 19, 2022

    • Bug fixes and security enhancements.

    • Operating system security updates.

  • November 4, 2021

    • Fixed a bug that could cause Structured Streaming streams to fail with an ArrayIndexOutOfBoundsException

    • Fixed a race condition that might cause a query failure with an IOException like java.io.IOException: No FileSystem for scheme or that might cause modifications to sparkContext.hadoopConfiguration to not take effect in queries.

    • The Apache Spark Connector for Delta Sharing was upgraded to 0.2.0.

  • September 22, 2021

    • Fixed a bug in cast Spark array with null to string

  • September 15, 2021

    • Fixed a race condition that might cause a query failure with an IOException like java.io.IOException: org.apache.spark.SparkException: Failed to get broadcast_x_piecey of broadcast_x.

  • September 8, 2021

    • Added support for schema name (databaseName.schemaName.tableName format) as the target table name for Azure Synapse Connector.

    • Added geometry and geography JDBC types support for Spark SQL.

    • [SPARK-33527][SQL] Extended the function of decode to be consistent with mainstream databases.

    • [SPARK-36532][CORE][3.1] Fixed deadlock in CoarseGrainedExecutorBackend.onDisconnected to avoid executorsconnected to prevent executor shutdown hang.

  • August 25, 2021

    • SQL Server driver library was upgraded to 9.2.1.jre8.

    • Snowflake connector was upgraded to 2.9.0.

    • Fixed broken link to best trial notebook on AutoML experiment page.

Databricks Runtime 8.4 (unsupported)

See Databricks Runtime 8.4 (unsupported).

  • January 19, 2022

    • Operating system security updates.

  • November 4, 2021

    • Fixed a bug that could cause Structured Streaming streams to fail with an ArrayIndexOutOfBoundsException

    • Fixed a race condition that might cause a query failure with an IOException like java.io.IOException: No FileSystem for scheme or that might cause modifications to sparkContext.hadoopConfiguration to not take effect in queries.

    • The Apache Spark Connector for Delta Sharing was upgraded to 0.2.0.

  • September 22, 2021

    • Spark JDBC driver was upgraded to 2.6.19.1030

    • [SPARK-36734][SQL] Upgrade ORC to 1.5.1

  • September 15, 2021

    • Fixed a race condition that might cause a query failure with an IOException like java.io.IOException: org.apache.spark.SparkException: Failed to get broadcast_x_piecey of broadcast_x.

    • Operating system security updates.

  • September 8, 2021

    • [SPARK-36532][CORE][3.1] Fixed deadlock in CoarseGrainedExecutorBackend.onDisconnected to avoid executorsconnected to prevent executor shutdown hang.

  • August 25, 2021

    • SQL Server driver library was upgraded to 9.2.1.jre8.

    • Snowflake connector was upgraded to 2.9.0.

    • Fixes a bug in credential passthrough caused by the new Parquet prefetch optimization, where user’s passthrough credential might not be found during file access.

  • August 11, 2021

    • Fixes a RocksDB incompatibility problem that prevents older Databricks Runtime 8.4. This fixes forward compatibility for Auto Loader, COPY INTO, and stateful streaming applications.

    • Fixes a bug in Auto Loader with S3 paths when using Auto Loader without a path option.

    • Fixes a bug that misconfigured AWS STS endpoints as Amazon Kinesis endpoints for the Kinesis source.

    • Fixes a bug when using Auto Loader to read CSV files with mismatching header files. If column names do not match, the column would be filled in with nulls. Now, if a schema is provided, it assumes the schema is the same and will only save column mismatches if rescued data columns are enabled.

    • Adds a new option called externalDataSource into the Azure Synapse connector to remove the CONTROL permission requirement on the database for PolyBase reading.

  • July 29, 2021

    • [SPARK-36034][BUILD] Rebase datetime in pushed down filters to Parquet

    • [SPARK-36163][BUILD] Propagate correct JDBC properties in JDBC connector provider and add connectionProvider option

Databricks Runtime 8.3 (unsupported)

See Databricks Runtime 8.3 (unsupported).

  • January 19, 2022

    • Operating system security updates.

  • November 4, 2021

    • Fixed a bug that could cause Structured Streaming streams to fail with an ArrayIndexOutOfBoundsException

    • Fixed a race condition that might cause a query failure with an IOException like java.io.IOException: No FileSystem for scheme or that might cause modifications to sparkContext.hadoopConfiguration to not take effect in queries.

  • September 22, 2021

    • Spark JDBC driver was upgraded to 2.6.19.1030

  • September 15, 2021

    • Fixed a race condition that might cause a query failure with an IOException like java.io.IOException: org.apache.spark.SparkException: Failed to get broadcast_x_piecey of broadcast_x.

    • Operating system security updates.

  • September 8, 2021

    • [SPARK-35700][SQL][WARMFIX] Read char/varchar orc table when created and written by external systems.

    • [SPARK-36532][CORE][3.1] Fixed deadlock in CoarseGrainedExecutorBackend.onDisconnected to avoid executorsconnected to prevent executor shutdown hang.

  • August 25, 2021

    • SQL Server driver library was upgraded to 9.2.1.jre8.

    • Snowflake connector was upgraded to 2.9.0.

    • Fixes a bug in credential passthrough caused by the new Parquet prefetch optimization, where user’s passthrough credential might not be found during file access.

  • August 11, 2021

    • Fixes a bug that misconfigured AWS STS endpoints as Amazon Kinesis endpoints for the Kinesis source.

    • Fixes a bug when using Auto Loader to read CSV files with mismatching header files. If column names do not match, the column would be filled in with nulls. Now, if a schema is provided, it assumes the schema is the same and will only save column mismatches if rescued data columns are enabled.

  • July 29, 2021

    • Upgrade Databricks Snowflake Spark connector to 2.9.0-spark-3.1

    • [SPARK-36034][BUILD] Rebase datetime in pushed down filters to Parquet

    • [SPARK-36163][BUILD] Propagate correct JDBC properties in JDBC connector provider and add connectionProvider option

  • July 14, 2021

    • Fixed an issue when using column names with dots in Azure Synapse connector.

    • Introduced database.schema.table format for Synapse Connector.

    • Added support to provide databaseName.schemaName.tableName format as the target table instead of only schemaName.tableName or tableName.

  • June 15, 2021

    • Fixed a NoSuchElementException bug in Delta Lake optimized writes that can happen when writing large amounts of data and encountering executor losses

    • Adds SQL CREATE GROUP, DROP GROUP, ALTER GROUP, SHOW GROUPS, and SHOW USERS commands. For details, see Security statements and Show statements.

Databricks Runtime 8.2 (unsupported)

See Databricks Runtime 8.2 (unsupported).

  • September 22, 2021

    • Operating system security updates.

  • September 15, 2021

    • Fixed a race condition that might cause a query failure with an IOException like java.io.IOException: org.apache.spark.SparkException: Failed to get broadcast_x_piecey of broadcast_x.

  • September 8, 2021

    • [SPARK-35700][SQL][WARMFIX] Read char/varchar orc table when created and written by external systems.

    • [SPARK-36532][CORE][3.1] Fixed deadlock in CoarseGrainedExecutorBackend.onDisconnected to avoid executorsconnected to prevent executor shutdown hang.

  • August 25, 2021

    • Snowflake connector was upgraded to 2.9.0.

  • August 11, 2021

    • Fixes a bug that misconfigured AWS STS endpoints as Amazon Kinesis endpoints for the Kinesis source.

    • [SPARK-36034][SQL] Rebase datetime in pushed down filters to parquet.

  • July 29, 2021

    • Upgrade Databricks Snowflake Spark connector to 2.9.0-spark-3.1

    • [SPARK-36163][BUILD] Propagate correct JDBC properties in JDBC connector provider and add connectionProvider option

  • July 14, 2021

    • Fixed an issue when using column names with dots in Azure Synapse connector.

    • Introduced database.schema.table format for Synapse Connector.

    • Added support to provide databaseName.schemaName.tableName format as the target table instead of only schemaName.tableName or tableName.

    • Fixed a bug that prevents users from time traveling to older available versions with Delta tables.

  • June 15, 2021

    • Fixes a NoSuchElementException bug in Delta Lake optimized writes that can happen when writing large amounts of data and encountering executor losses

  • May 26, 2021

    • Updated Python with security patch to fix Python security vulnerability (CVE-2021-3177).

    • Disk caching is enabled by default on all GCP instances except those in the -highcpu- family. For -highcpu- instances, the cache is preconfigured but disabled by default. It can be enabled using the spark confing spark.databricks.io.cache.enabled true.

  • April 30, 2021

    • Operating system security updates.

    • [SPARK-35227][BUILD] Update the resolver for spark-packages in SparkSubmit

    • [SPARK-34245][CORE] Ensure Master removes executors that failed to send finished state

    • Fixed an OOM issue when Auto Loader reports Structured Streaming progress metrics.

Databricks Runtime 8.1 (unsupported)

See Databricks Runtime 8.1 (unsupported).

  • September 22, 2021

    • Operating system security updates.

  • September 15, 2021

    • Fixed a race condition that might cause a query failure with an IOException like java.io.IOException: org.apache.spark.SparkException: Failed to get broadcast_x_piecey of broadcast_x.

  • September 8, 2021

    • [SPARK-35700][SQL][WARMFIX] Read char/varchar orc table when created and written by external systems.

    • [SPARK-36532][CORE][3.1] Fixed deadlock in CoarseGrainedExecutorBackend.onDisconnected to avoid executorsconnected to prevent executor shutdown hang.

  • August 25, 2021

    • Snowflake connector was upgraded to 2.9.0.

  • August 11, 2021

    • Fixes a bug that misconfigured AWS STS endpoints as Amazon Kinesis endpoints for the Kinesis source.

    • [SPARK-36034][SQL] Rebase datetime in pushed down filters to parquet.

  • July 29, 2021

    • Upgrade Databricks Snowflake Spark connector to 2.9.0-spark-3.1

    • [SPARK-36163][BUILD] Propagate correct JDBC properties in JDBC connector provider and add connectionProvider option

  • July 14, 2021

    • Fixed an issue when using column names with dots in Azure Synapse connector.

    • Fixed a bug that prevents users from time traveling to older available versions with Delta tables.

  • June 15, 2021

    • Fixes a NoSuchElementException bug in Delta Lake optimized writes that can happen when writing large amounts of data and encountering executor losses

  • May 26, 2021

    • Updated Python with security patch to fix Python security vulnerability (CVE-2021-3177).

    • Disk caching is enabled by default on all GCP instances except those in the -highcpu- family. For -highcpu- instances, the cache is preconfigured but disabled by default. It can be enabled using the spark confing spark.databricks.io.cache.enabled true.

  • April 30, 2021

    • Operating system security updates.

    • [SPARK-35227][BUILD] Update the resolver for spark-packages in SparkSubmit

    • Fixed an OOM issue when Auto Loader reports Structured Streaming progress metrics.

  • April 27, 2021

    • [SPARK-34245][CORE] Ensure Master removes executors that failed to send finished state

    • [SPARK-34856][SQL] ANSI mode: Allow casting complex types as string type

    • [SPARK-35014] Fix the PhysicalAggregation pattern to not rewrite foldable expressions

    • [SPARK-34769][SQL] AnsiTypeCoercion: return narrowest convertible type among TypeCollection

    • [SPARK-34614][SQL] ANSI mode: Casting String to Boolean will throw exception on parse error

    • [SPARK-33794][SQL] ANSI mode: Fix NextDay expression to throw runtime IllegalArgumentException when receiving invalid input under

Databricks Runtime 8.0 (unsupported)

See Databricks Runtime 8.0 (unsupported).

  • September 15, 2021

    • Fixed a race condition that might cause a query failure with an IOException like java.io.IOException: org.apache.spark.SparkException: Failed to get broadcast_x_piecey of broadcast_x.

  • August 25, 2021

    • Snowflake connector was upgraded to 2.9.0.

  • August 11, 2021

    • Fixes a bug that misconfigured AWS STS endpoints as Amazon Kinesis endpoints for the Kinesis source.

    • [SPARK-36034][SQL] Rebase datetime in pushed down filters to parquet.

  • July 29, 2021

    • [SPARK-36163][BUILD] Propagate correct JDBC properties in JDBC connector provider and add connectionProvider option

  • July 14, 2021

    • Fixed an issue when using column names with dots in Azure Synapse connector.

    • Fixed a bug that prevents users from time traveling to older available versions with Delta tables.

  • May 26, 2021

    • Updated Python with security patch to fix Python security vulnerability (CVE-2021-3177).

    • Disk caching is enabled by default on all GCP instances except those in the -highcpu- family. For -highcpu- instances, the cache is preconfigured but disabled by default. It can be enabled using the spark confing spark.databricks.io.cache.enabled true.

    • Enable Maven library installation.

  • April 30, 2021

    • Operating system security updates.

    • [SPARK-35227][BUILD] Update the resolver for spark-packages in SparkSubmit

    • [SPARK-34245][CORE] Ensure Master removes executors that failed to send finished state

  • March 24, 2021

    • [SPARK-34681][SQL] Fix bug for full outer shuffled hash join when building left side with non-equal condition

    • [SPARK-34534] Fix blockIds order when use FetchShuffleBlocks to fetch blocks

    • [SPARK-34613][SQL] Fix view does not capture disable hint config

  • March 9, 2021

    • [SPARK-34543][SQL] Respect the spark.sql.caseSensitive config while resolving partition spec in v1 SET LOCATION

    • [SPARK-34392][SQL] Support ZoneOffset +h:mm in DateTimeUtils. getZoneId

    • [UI] Fix the href link of Spark DAG Visualization

    • [SPARK-34436][SQL] DPP support LIKE ANY/ALL expression

Databricks Runtime 7.6 (unsupported)

See Databricks Runtime 7.6 (unsupported).

  • August 11, 2021

    • Fixes a bug that misconfigured AWS STS endpoints as Amazon Kinesis endpoints for the Kinesis source.

    • [SPARK-36034][SQL] Rebase datetime in pushed down filters to parquet.

  • July 29, 2021

    • [SPARK-32998][BUILD] Add ability to override default remote repos with internal repos only

  • July 14, 2021

    • Fixed a bug that prevents users from time traveling to older available versions with Delta tables.

  • May 26, 2021

    • Updated Python with security patch to fix Python security vulnerability (CVE-2021-3177).

    • Disk caching is enabled by default on all GCP instances except those in the -highcpu- family. For -highcpu- instances, the cache is preconfigured but disabled by default. It can be enabled using the spark confing spark.databricks.io.cache.enabled true.

    • Enable Maven library installation.

  • April 30, 2021

    • Operating system security updates.

    • [SPARK-35227][BUILD] Update the resolver for spark-packages in SparkSubmit

    • [SPARK-34245][CORE] Ensure Master removes executors that failed to send finished state

  • March 24, 2021

    • [SPARK-34768][SQL] Respect the default input buffer size in Univocity

    • [SPARK-34534] Fix blockIds order when use FetchShuffleBlocks to fetch blocks

  • March 9, 2021

    • (Azure only) Fixed an Auto Loader bug that can cause NullPointerException when using Databricks Runtime 7.6 to run an old Auto Loader stream created in Databricks Runtime 7.2

    • [UI] Fix the href link of Spark DAG Visualization

    • Unknown leaf-node SparkPlan is not handled correctly in SizeInBytesOnlyStatsSparkPlanVisitor

    • Restore the output schema of SHOW DATABASES

    • [Delta][8.0, 7.6] Fixed calculation bug in file size auto-tuning logic

    • Disable staleness check for Delta table files in disk cache

    • [SQL] Use correct dynamic pruning build key when range join hint is present

    • Disable char type support in non-SQL code path

    • Avoid NPE in DataFrameReader.schema

    • Fix NPE when EventGridClient response has no entity

    • Fix a read closed stream bug in Azure Auto Loader

    • [SQL] Do not generate shuffle partition number advice when AOS is enabled

  • February 24, 2021

    • Upgraded the Spark BigQuery connector to v0.18, which introduces various bug fixes and support for Arrow and Avro iterators.

    • Fixed a correctness issue that caused Spark to return incorrect results when the Parquet file’s decimal precision and scale are different from the Spark schema.

    • Fixed reading failure issue on Microsoft SQL Server tables that contain spatial data types, by adding geometry and geography JDBC types support for Spark SQL.

    • Introduced a new configuration spark.databricks.hive.metastore.init.reloadFunctions.enabled. This configuration controls the built in Hive initialization. When set to true, Databricks reloads all functions from all databases that users have into FunctionRegistry. This is the default behavior in Hive Metastore. When set to false, Databricks disables this process for optimization.

    • [SPARK-34212] Fixed issues related to reading decimal data from Parquet files.

    • [SPARK-34260][SQL] Fix UnresolvedException when creating temp view twice.

Databricks Runtime 7.5 (unsupported)

See Databricks Runtime 7.5 (unsupported).

  • May 26, 2021

    • Updated Python with security patch to fix Python security vulnerability (CVE-2021-3177).

    • Disk caching is enabled by default on all GCP instances except those in the -highcpu- family. For -highcpu- instances, the cache is preconfigured but disabled by default. It can be enabled using the spark confing spark.databricks.io.cache.enabled true.

    • Enable Maven library installation.

  • April 30, 2021

    • Operating system security updates.

    • [SPARK-35227][BUILD] Update the resolver for spark-packages in SparkSubmit

    • [SPARK-34245][CORE] Ensure Master removes executors that failed to send finished state

  • March 24, 2021

    • [SPARK-34768][SQL] Respect the default input buffer size in Univocity

    • [SPARK-34534] Fix blockIds order when use FetchShuffleBlocks to fetch blocks

  • March 9, 2021

    • (Azure only) Fixed an Auto Loader bug that can cause NullPointerException when using Databricks Runtime 7.5 to run an old Auto Loader stream created in Databricks Runtime 7.2.

    • [UI] Fix the href link of Spark DAG Visualization

    • Unknown leaf-node SparkPlan is not handled correctly in SizeInBytesOnlyStatsSparkPlanVisitor

    • Restore the output schema of SHOW DATABASES

    • Disable staleness check for Delta table files in disk cache

    • [SQL] Use correct dynamic pruning build key when range join hint is present

    • Disable char type support in non-SQL code path

    • Avoid NPE in DataFrameReader.schema

    • Fix NPE when EventGridClient response has no entity

    • Fix a read closed stream bug in Azure Auto Loader

  • February 24, 2021

    • Upgraded the Spark BigQuery connector to v0.18, which introduces various bug fixes and support for Arrow and Avro iterators.

    • Fixed a correctness issue that caused Spark to return incorrect results when the Parquet file’s decimal precision and scale are different from the Spark schema.

    • Fixed reading failure issue on Microsoft SQL Server tables that contain spatial data types, by adding geometry and geography JDBC types support for Spark SQL.

    • Introduced a new configuration spark.databricks.hive.metastore.init.reloadFunctions.enabled. This configuration controls the built in Hive initialization. When set to true, Databricks reloads all functions from all databases that users have into FunctionRegistry. This is the default behavior in Hive Metastore. When set to false, Databricks disables this process for optimization.

    • [SPARK-34212] Fixed issues related to reading decimal data from Parquet files.

    • [SPARK-34260][SQL] Fix UnresolvedException when creating temp view twice.

  • February 4, 2021

    • Fixed a regression that prevents the incremental execution of a query that sets a global limit such as SELECT * FROM table LIMIT nrows. The regression was experienced by users running queries via ODBC/JDBC with Arrow serialization enabled.

    • Introduced write time checks to the Hive client to prevent the corruption of metadata in the Hive metastore for Delta tables.

    • Fixed a regression that caused DBFS FUSE to fail to start when cluster environment variable configurations contain invalid bash syntax.

  • January 20, 2021

    • Fixed a regression in the January 12, 2021 maintenance release that can cause an incorrect AnalysisException and say the column is ambiguous in a self join. This regression happens when a user joins a DataFrame with its derived DataFrame (a so-called self-join) with the following conditions:

      • These two DataFrames have common columns, but the output of the self join does not have common columns. For example, df.join(df.select($"col" as "new_col"), cond)

      • The derived DataFrame excludes some columns via select, groupBy, or window.

      • The join condition or the following transformation after the joined Dataframe refers to the non-common columns. For example, df.join(df.drop("a"), df("a") === 1)

  • January 12, 2021

    • Upgrade Azure Storage SDK from 2.3.8 to 2.3.9.

    • [SPARK-33593][SQL] Vector reader got incorrect data with binary partition value

    • [SPARK-33480][SQL] updates the error message of char/varchar table insertion length check

Databricks Runtime 7.3 LTS (unsupported)

See Databricks Runtime 7.3 LTS (unsupported).

  • September 10, 2023

    • Miscellaneous bug fixes.

  • August 30, 2023

    • Operating system security updates.

  • August 15, 2023

    • Operating system security updates.

  • June 23, 2023

    • Snowflake-jdbc library is upgraded to 3.13.29 to address a security issue.

    • Operating system security updates.

  • June 15, 2023

    • [SPARK-43413][SQL] Fix IN subquery ListQuery nullability.

    • Operating system security updates.

  • June 2, 2023

    • Fixed an issue in Auto Loader where different source file formats were inconsistent when the provided schema did not include inferred partitions. This issue could cause unexpected failures when reading files with missing columns in the inferred partition schema.

  • May 17, 2023

    • Operating system security updates.

  • April 25, 2023

    • Operating system security updates.

  • April 11, 2023

    • [SPARK-42967][CORE] Fix SparkListenerTaskStart.stageAttemptId when a task is started after the stage is cancelled.

    • Miscellaneous bug fixes.

  • March 29, 2023

    • Operating system security updates.

  • March 14, 2023

    • Miscellaneous bug fixes.

  • February 28, 2023

    • Operating system security updates.

  • February 16, 2023

    • Operating system security updates.

  • January 31, 2023

    • Table types of JDBC tables are now EXTERNAL by default.

  • January 18, 2023

    • Operating system security updates.

  • November 29, 2022

    • Miscellaneous bug fixes.

  • November 15, 2022

    • Upgraded Apache commons-text to 1.10.0.

    • Operating system security updates.

    • Miscellaneous bug fixes.

  • November 1, 2022

    • [SPARK-38542][SQL] UnsafeHashedRelation should serialize numKeys out

  • October 18, 2022

    • Operating system security updates.

  • October 5, 2022

    • Miscellaneous bug fixes.

    • Operating system security updates.

  • September 22, 2022

  • September 6, 2022

    • [SPARK-35542][CORE][ML] Fix: Bucketizer created for multiple columns with parameters splitsArray, inputCols and outputCols can not be loaded after saving it

    • [SPARK-40079][CORE] Add Imputer inputCols validation for empty input case

  • August 24, 2022

    • [SPARK-39962][PYTHON][SQL] Apply projection when group attributes are empty

    • Operating system security updates.

  • August 9, 2022

    • Operating system security updates.

  • July 27, 2022

    • Make Delta MERGE operation results consistent when source is non-deterministic.

    • Operating system security updates.

    • Miscellaneous bug fixes.

  • July 13, 2022

    • [SPARK-32680][SQL] Don’t Preprocess V2 CTAS with Unresolved Query

    • Disabled Auto Loader’s use of native cloud APIs for directory listing on Azure.

    • Operating system security updates.

  • July 5, 2022

    • Operating system security updates.

    • Miscellaneous bug fixes.

  • June 2, 2022

    • [SPARK-38918][SQL] Nested column pruning should filter out attributes that do not belong to the current relation

    • Operating system security updates.

  • May 18, 2022

    • Upgrade AWS SDK version from 1.11.655 to 1.11.678.

    • Operating system security updates.

    • Miscellaneous bug fixes.

  • April 19, 2022

    • Operating system security updates.

    • Miscellaneous bug fixes.

  • April 6, 2022

    • Operating system security updates.

    • Miscellaneous bug fixes.

  • March 14, 2022

    • Remove vulnerable classes from log4j 1.2.17 jar

    • Miscellaneous bug fixes.

  • February 23, 2022

    • [SPARK-37859][SQL] Do not check for metadata during schema comparison

  • February 8, 2022

    • Upgrade Ubuntu JDK to 1.8.0.312.

    • Operating system security updates.

  • February 1, 2022

    • Operating system security updates.

  • January 26, 2022

    • Fixed a bug where the OPTIMIZE command could fail when the ANSI SQL dialect was enabled.

  • January 19, 2022

    • Conda defaults channel is removed from 7.3 ML LTS

    • Operating system security updates.

  • December 7, 2021

    • Operating system security updates.

  • November 4, 2021

    • Fixed a bug that could cause Structured Streaming streams to fail with an ArrayIndexOutOfBoundsException

    • Fixed a race condition that might cause a query failure with an IOException like java.io.IOException: No FileSystem for scheme or that might cause modifications to sparkContext.hadoopConfiguration to not take effect in queries.

  • September 15, 2021

    • Fixed a race condition that might cause a query failure with an IOException like java.io.IOException: org.apache.spark.SparkException: Failed to get broadcast_x_piecey of broadcast_x.

    • Operating system security updates.

  • September 8, 2021

    • [SPARK-35700][SQL][WARMFIX] Read char/varchar orc table when created and written by external systems.

    • [SPARK-36532][CORE][3.1] Fixed deadlock in CoarseGrainedExecutorBackend.onDisconnected to avoid executorsconnected to prevent executor shutdown hang.

  • August 25, 2021

    • Snowflake connector was upgraded to 2.9.0.

  • July 29, 2021

    • [SPARK-36034][BUILD] Rebase datetime in pushed down filters to Parquet

    • [SPARK-34508][BUILD] Skip HiveExternalCatalogVersionsSuite if network is down

  • July 14, 2021

    • Introduced database.schema.table format for Azure Synapse connector.

    • Added support to provide databaseName.schemaName.tableName format as the target table instead of only schemaName.tableName or tableName.

    • Fixed a bug that prevents users from time traveling to older available versions with Delta tables.

  • June 15, 2021

    • Fixes a NoSuchElementException bug in Delta Lake optimized writes that can happen when writing large amounts of data and encountering executor losses

    • Updated Python with security patch to fix Python security vulnerability (CVE-2021-3177).

    • Disk caching is enabled by default on all GCP instances except those in the -highcpu- family. For -highcpu- instances, the cache is preconfigured but disabled by default. It can be enabled using the spark confing spark.databricks.io.cache.enabled true.

  • April 30, 2021

    • Operating system security updates.

    • [SPARK-35227][BUILD] Update the resolver for spark-packages in SparkSubmit

    • [SPARK-34245][CORE] Ensure Master removes executors that failed to send finished state

    • [SPARK-35045][SQL] Add an internal option to control input buffer in univocity

  • March 24, 2021

    • [SPARK-34768][SQL] Respect the default input buffer size in Univocity

    • [SPARK-34534] Fix blockIds order when use FetchShuffleBlocks to fetch blocks

    • [SPARK-33118][SQL]CREATE TEMPORARY TABLE fails with location

  • March 9, 2021

    • The updated Azure Blob File System driver for Azure Data Lake Storage Gen2 is now enabled by default. It brings multiple stability improvements.

    • Fix path separator on Windows for databricks-connect get-jar-dir

    • [UI] Fix the href link of Spark DAG Visualization

    • [DBCONNECT] Add support for FlatMapCoGroupsInPandas in Databricks Connect 7.3

    • Restore the output schema of SHOW DATABASES

    • [SQL] Use correct dynamic pruning build key when range join hint is present

    • Disable staleness check for Delta table files in disk cache

    • [SQL] Do not generate shuffle partition number advice when AOS is enable

  • February 24, 2021

    • Upgraded the Spark BigQuery connector to v0.18, which introduces various bug fixes and support for Arrow and Avro iterators.

    • Fixed a correctness issue that caused Spark to return incorrect results when the Parquet file’s decimal precision and scale are different from the Spark schema.

    • Fixed reading failure issue on Microsoft SQL Server tables that contain spatial data types, by adding geometry and geography JDBC types support for Spark SQL.

    • Introduced a new configuration spark.databricks.hive.metastore.init.reloadFunctions.enabled. This configuration controls the built in Hive initialization. When set to true, Databricks reloads all functions from all databases that users have into FunctionRegistry. This is the default behavior in Hive Metastore. When set to false, Databricks disables this process for optimization.

    • [SPARK-34212] Fixed issues related to reading decimal data from Parquet files.

    • [SPARK-33579][UI] Fix executor blank page behind proxy.

    • [SPARK-20044][UI] Support Spark UI behind front-end reverse proxy using a path prefix.

    • [SPARK-33277][PYSPARK][SQL] Use ContextAwareIterator to stop consuming after the task ends.

  • February 4, 2021

    • Fixed a regression that prevents the incremental execution of a query that sets a global limit such as SELECT * FROM table LIMIT nrows. The regression was experienced by users running queries via ODBC/JDBC with Arrow serialization enabled.

    • Fixed a regression that caused DBFS FUSE to fail to start when cluster environment variable configurations contain invalid bash syntax.

  • January 20, 2021

    • Fixed a regression in the January 12, 2021 maintenance release that can cause an incorrect AnalysisException and say the column is ambiguous in a self join. This regression happens when a user joins a DataFrame with its derived DataFrame (a so-called self-join) with the following conditions:

      • These two DataFrames have common columns, but the output of the self join does not have common columns. For example, df.join(df.select($"col" as "new_col"), cond)

      • The derived DataFrame excludes some columns via select, groupBy, or window.

      • The join condition or the following transformation after the joined Dataframe refers to the non-common columns. For example, df.join(df.drop("a"), df("a") === 1)

  • January 12, 2021

    • Operating system security updates.

    • [SPARK-33593][SQL] Vector reader got incorrect data with binary partition value

    • [SPARK-33677][SQL] Skip LikeSimplification rule if pattern contains any escapeChar

    • [SPARK-33592][ML][PYTHON] Pyspark ML Validator params in estimatorParamMaps may be lost after saving and reloading

    • [SPARK-33071][SPARK-33536][SQL] Avoid changing dataset_id of LogicalPlan in join() to not break DetectAmbiguousSelfJoin

  • December 8, 2020

    • [SPARK-33587][CORE] Kill the executor on nested fatal errors

    • [SPARK-27421][SQL] Fix filter for int column and value class java.lang.String when pruning partition column

    • [SPARK-33316][SQL] Support user provided nullable Avro schema for non-nullable catalyst schema in Avro writing

    • Spark Jobs launched using Databricks Connect could hang indefinitely with Executor$TaskRunner.$anonfun$copySessionState in executor stack trace

    • Operating system security updates.

  • November 20, 2020

    • [SPARK-33404][SQL][3.0] Fix incorrect results in date_trunc expression

    • [SPARK-33339][PYTHON] Pyspark application will hang due to non Exception error

    • [SPARK-33183][SQL][HOTFIX] Fix Optimizer rule EliminateSorts and add a physical rule to remove redundant sorts

    • [SPARK-33371][PYTHON][3.0] Update setup.py and tests for Python 3.9

    • [SPARK-33391][SQL] element_at with CreateArray not respect one based index.

    • [SPARK-33306][SQL]Timezone is needed when cast date to string

    • [SPARK-33260][SQL] Fix incorrect results from SortExec when sortOrder is Stream

  • November 5, 2020

    • Fix ABFS and WASB locking with regard to UserGroupInformation.getCurrentUser().

    • Fix an infinite loop bug when Avro reader reads the MAGIC bytes.

    • Add support for the USAGE privilege.

    • Performance improvements for privilege checking in table access control.

  • October 13, 2020

    • Operating system security updates.

    • You can read and write from DBFS using the FUSE mount at /dbfs/ when on a high concurrency credential passthrough enabled cluster. Regular mounts are supported but mounts that need passthrough credentials are not supported yet.

    • [SPARK-32999][SQL] Use Utils.getSimpleName to avoid hitting Malformed class name in TreeNode

    • [SPARK-32585][SQL] Support scala enumeration in ScalaReflection

    • Fixed listing directories in FUSE mount that contain file names with invalid XML characters

    • FUSE mount no longer uses ListMultipartUploads

  • September 29, 2020

    • [SPARK-32718][SQL] Remove unnecessary keywords for interval units

    • [SPARK-32635][SQL] Fix foldable propagation

    • Add a new config spark.shuffle.io.decoder.consolidateThreshold. Set the config value to Long.MAX_VALUE to skip the consolidation of netty FrameBuffers, which prevents java.lang.IndexOutOfBoundsException in corner cases.

  • April 25, 2023

    • Operating system security updates.

  • April 11, 2023

    • Miscellaneous bug fixes.

  • March 29, 2023

    • Miscellaneous bug fixes.

  • March 14, 2023

    • Operating system security updates.

  • February 28, 2023

    • Operating system security updates.

  • February 16, 2023

    • Operating system security updates.

  • January 31, 2023

    • Miscellaneous bug fixes.

  • January 18, 2023

    • Operating system security updates.

  • November 29, 2022

    • Operating system security updates.

  • November 15, 2022

    • Operating system security updates.

    • Miscellaneous bug fixes.

  • November 1, 2022

    • Operating system security updates.

  • October 18, 2022

    • Operating system security updates.

    • October 5, 2022

      • Operating system security updates.

    • August 24, 2022

      • Operating system security updates.

    • August 9, 2022

      • Operating system security updates.

    • July 27, 2022

      • Operating system security updates.

    • July 5, 2022

      • Operating system security updates.

    • June 2, 2022

      • Operating system security updates.

    • May 18, 2022

      • Operating system security updates.

    • April 19, 2022

      • Operating system security updates.

      • Miscellaneous bug fixes.

    • April 6, 2022

      • Operating system security updates.

      • Miscellaneous bug fixes.

    • March 14, 2022

      • Miscellaneous bug fixes.

    • February 23, 2022

      • Miscellaneous bug fixes.

    • February 8, 2022

      • Upgrade Ubuntu JDK to 1.8.0.312.

      • Operating system security updates.

    • February 1, 2022

      • Operating system security updates.

    • January 19, 2022

      • Operating system security updates.

    • September 22, 2021

      • Operating system security updates.

    • April 30, 2021

      • Operating system security updates.

      • [SPARK-35227][BUILD] Update the resolver for spark-packages in SparkSubmit

    • January 12, 2021

      • Operating system security updates.

    • December 8, 2020

      • [SPARK-27421][SQL] Fix filter for int column and value class java.lang.String when pruning partition column

      • Operating system security updates.

    • November 3, 2020

      • Upgraded Java version from 1.8.0_252 to 1.8.0_265.

      • Fix ABFS and WASB locking with regard to UserGroupInformation.getCurrentUser()

    • October 13, 2020

      • Operating system security updates.