Databricks Runtime maintenance updates
This article lists maintenance updates for supported Databricks Runtime versions. To add a maintenance update to an existing cluster, restart the cluster. For the maintenance updates on unsupported Databricks Runtime versions, see Maintenance updates for Databricks Runtime (archived).
Note
Releases are staged. Your Databricks account might not update for a few days after the initial release date.
Note
This list of maintenance updates can include references to features unavailable on Google Cloud.
Databricks Runtime releases
Maintenance updates by release:
Databricks Runtime 15.4
See Databricks Runtime 15.4 LTS.
October 22, 2024
[SPARK-49782][SQL] ResolveDataFrameDropColumns rule resolves UnresolvedAttribute with child output
[SPARK-49867][SQL] Improve the error message when index is out of bounds when calling GetColumnByOrdinal
[SPARK-49863][SQL] Fix NormalizeFloatingNumbers to preserve nullability of nested structs
[SPARK-49829] Revise the optimization on adding input to state store in stream-stream join (correctness fix)
[SPARK-49905] Use dedicated ShuffleOrigin for stateful operator to prevent the shuffle to be modified from AQE
[SPARK-46632][SQL] Fix subexpression elimination when equivalent ternary expressions have different children
[SPARK-49443][SQL][PYTHON] Implement to_variant_object expression and make schema_of_variant expressions print OBJECT for Variant Objects
[SPARK-49615] Bugfix: Make ML column schema validation conforms with spark config
spark.sql.caseSensitive
.
October 10, 2024
[SPARK-49743][SQL] OptimizeCsvJsonExpr should not change schema fields when pruning GetArrayStructFields
[SPARK-49688][CONNECT] Fix a data race between interrupt and execute plan
[BACKPORT][[SPARK-49474]]https://issues.apache.org/jira/browse/SPARK-49474)[SS] Classify Error class for FlatMapGroupsWithState user function error
[SPARK-49460][SQL] Followup: fix potential NPE risk
September 25, 2024
[SPARK-49628][SQL] ConstantFolding should copy stateful expression before evaluating
[SPARK-49000][SQL] Fix “select count(distinct 1) from t” where t is empty table by expanding RewriteDistinctAggregates
[SPARK-49492][CONNECT] Reattach attempted on inactive ExecutionHolder
[SPARK-49458][CONNECT][PYTHON] Supply server-side session id via ReattachExecute
[SPARK-49017][SQL] Insert statement fails when multiple parameters are being used
[SPARK-49451] Allow duplicate keys in parse_json.
Miscellaneous bug fixes.
September 17, 2024
[SPARK-48463][ML] Make Binarizer, Bucketizer, Vector Assembler, FeatureHasher, QuantizeDiscretizer, OnehotEncoder, StopWordsRemover, Imputer, Interactor supporting nested input columns
[SPARK-49409][CONNECT] Adjust the default value of CONNECT_SESSION_PLAN_CACHE_SIZE
[SPARK-49526][CONNECT][HOTFIX-15.4.2] Support Windows-style paths in ArtifactManager
Revert “[SPARK-48482][PYTHON] dropDuplicates and dropDuplicatesWIthinWatermark should accept variable length args”
[SPARK-43242][CORE] Fix throw ‘Unexpected type of BlockId’ in shuffle corruption diagnose
[SPARK-49366][CONNECT] Treat Union node as leaf in dataframe column resolution
[SPARK-49018][SQL] Fix approx_count_distinct not working correctly with collation
[SPARK-49460][SQL] Remove
cleanupResource()
from EmptyRelationExec[SPARK-49056][SQL] ErrorClassesJsonReader cannot handle null properly
[SPARK-49336][CONNECT] Limit the nesting level when truncating a protobuf message
August 29, 2024
The output from a
SHOW CREATE TABLE
statement now includes any row filters or column masks defined on a materialized view or streaming table. See SHOW CREATE TABLE. To learn about row filters and column masks, see Filter sensitive table data using row filters and column masks.On compute configured with shared access mode, Kafka batch reads and writes now have the same limitations enforced as those documented for Structured Streaming. See Streaming limitations and requirements for Unity Catalog shared access mode.
[SPARK-48941][SPARK-48970] Backport ML writer / reader fixes
[SPARK-49074][SQL] Fix variant with
df.cache()
[SPARK-49263][CONNECT] Spark Connect python client: Consistently handle boolean Dataframe reader options
[SPARK-48955][SQL] Include ArrayCompact changes in 15.4
[SPARK-48937][SQL] Add collation support for StringToMap string expressions
[SPARK-48929] Fix view internal error and clean up parser exception context
[SPARK-49125][SQL] Allow duplicated column names in CSV writing
[SPARK-48934][SS] Python datetime types converted incorrectly for setting timeout in applyInPandasWithState
[SPARK-48843] Prevent infinite loop with BindParameters
[SPARK-48981] Fix simpleString method of StringType in pyspark for collations
[SPARK-49065][SQL] Rebasing in legacy formatters/parsers must support non JVM default time zones
[SPARK-48896] [SPARK-48909] [SPARK-48883] Backport spark ML writer fixes
[SPARK-48725][SQL] Integrate CollationAwareUTF8String.lowerCaseCodePoints into string expressions
[SPARK-48978][SQL] Implement ASCII fast path in collation support for UTF8_LCASE
[SPARK-49047][PYTHON][CONNECT] Truncate the message for logging
[SPARK-49146][SS] Move assertion errors related to watermark missing in append mode streaming queries to error framework
[SPARK-48977][SQL] Optimize string searching under UTF8_LCASE collation
[SPARK-48889][SS] testStream to unload state stores before finishing
[SPARK-48463] Make StringIndexer supporting nested input columns
[SPARK-48954] try_mod() replaces try_remainder()
Operating system security updates.
Databricks Runtime 15.3
October 22, 2024
[SPARK-49905] Use dedicated ShuffleOrigin for stateful operator to prevent the shuffle to be modified from AQE
[SPARK-49867][SQL] Improve the error message when index is out of bounds when calling GetColumnByOrdinal
[SPARK-48843][15.3,15.2] Prevent infinite loop with BindParameters
[SPARK-49829] Revise the optimization on adding input to state store in stream-stream join (correctness fix)
[SPARK-49863][SQL] Fix NormalizeFloatingNumbers to preserve nullability of nested structs
[SPARK-49782][SQL] ResolveDataFrameDropColumns rule resolves UnresolvedAttribute with child output
[SPARK-46632][SQL] Fix subexpression elimination when equivalent ternary expressions have different children
Operating system security updates.
October 10, 2024
[SPARK-49688][CONNECT] Fix a data race between interrupt and execute plan
[SPARK-49743][SQL] OptimizeCsvJsonExpr should not change schema fields when pruning GetArrayStructFields
[BACKPORT][[SPARK-49474]]https://issues.apache.org/jira/browse/SPARK-49474)[SS] Classify Error class for FlatMapGroupsWithState user function error
Operating system security updates.
September 25, 2024
[SPARK-49492][CONNECT] Reattach attempted on inactive ExecutionHolder
[SPARK-49628][SQL] ConstantFolding should copy stateful expression before evaluating
[SPARK-49000][SQL] Fix “select count(distinct 1) from t” where t is empty table by expanding RewriteDistinctAggregates
[SPARK-49458][CONNECT][PYTHON] Supply server-side session id via ReattachExecute
[SPARK-48719][SQL] Fix the calculation bug of
RegrSlope
&RegrIntercept
when the first parameter is nullOperating system security updates.
September 17, 2024
[SPARK-49336][CONNECT] Limit the nesting level when truncating a protobuf message
[SPARK-49526][CONNECT][15.3.5] Support Windows-style paths in ArtifactManager
[SPARK-49366][CONNECT] Treat Union node as leaf in dataframe column resolution
[SPARK-43242][CORE] Fix throw ‘Unexpected type of BlockId’ in shuffle corruption diagnose
[SPARK-49409][CONNECT] Adjust the default value of CONNECT_SESSION_PLAN_CACHE_SIZE
Operating system security updates.
August 29, 2024
[SPARK-49263][CONNECT] Spark Connect python client: Consistently handle boolean Dataframe reader options
[SPARK-49056][SQL] ErrorClassesJsonReader cannot handle null properly
[SPARK-48862][PYTHON][CONNECT] Avoid calling
_proto_to_string
when INFO level is not enabled[SPARK-49146][SS] Move assertion errors related to watermark missing in append mode streaming queries to error framework
August 14, 2024
[SPARK-48941][SPARK-48970] Backport ML writer / reader fixes
[SPARK-48706][PYTHON] Python UDF in higher order functions should not throw internal error
[SPARK-48954] try_mod() replaces try_remainder()
[SPARK-48597][SQL] Introduce a marker for isStreaming property in text representation of logical plan
[SPARK-49065][SQL] Rebasing in legacy formatters/parsers must support non JVM default time zones
[SPARK-49047][PYTHON][CONNECT] Truncate the message for logging
[SPARK-48740][SQL] Catch missing window specification error early
August 1, 2024
[Breaking change] In Databricks Runtime 15.3 and above, calling any Python user-defined function (UDF), user-defined aggregate function (UDAF), or user-defined table function (UDTF) that uses a
VARIANT
type as an argument or return value throws an exception. This change is made to prevent issues that might occur because of an invalid value returned by one of these functions. To learn more about theVARIANT
type, see use VARIANTs to store semi-structured data.On serverless compute for notebooks and jobs, ANSI SQL mode is enabled by default. See Supported Spark configuration parameters.
On compute configured with shared access mode, Kafka batch reads and writes now have the same limitations enforced as those documented for Structured Streaming. See Streaming limitations and requirements for Unity Catalog shared access mode.
The output from a
SHOW CREATE TABLE
statement now includes any row filters or column masks defined on a materialized view or streaming table. See SHOW CREATE TABLE. To learn about row filters and column masks, see Filter sensitive table data using row filters and column masks.[SPARK-46957][CORE] Decommission migrated shuffle files should be able to cleanup from executor
[SPARK-48648][PYTHON][CONNECT] Make SparkConnectClient.tags properly threadlocal
[SPARK-48896] [SPARK-48909] [SPARK-48883] Backport spark ML writer fixes
[SPARK-48713][SQL] Add index range check for UnsafeRow.pointTo when baseObject is byte array
[SPARK-48834][SQL] Disable variant input/output to python scalar UDFs, UDTFs, UDAFs during query compilation
[SPARK-48934][SS] Python datetime types converted incorrectly for setting timeout in applyInPandasWithState
[SPARK-48705][PYTHON] Explicitly use worker_main when it starts with pyspark
[SPARK-48544][SQL] Reduce memory pressure of empty TreeNode BitSets
[SPARK-48889][SS] testStream to unload state stores before finishing
[SPARK-49054][SQL] Column default value should support current_* functions
[SPARK-48653][PYTHON] Fix invalid Python data source error class references
[SPARK-48463] Make StringIndexer supporting nested input columns
[SPARK-48810][CONNECT] Session stop() API should be idempotent and not fail if the session is already closed by the server
[SPARK-48873][SQL] Use UnsafeRow in JSON parser.
Operating system security updates.
July 11, 2024
(Behavior change) DataFrames cached against Delta table sources are now invalidated if the source table is overwritten. This change means that all state changes to Delta tables now invalidate cached results. Use
.checkpoint()
to persist a table state throughout the lifetime of a DataFrame.The Snowflake JDBC Driver is updated to version 3.16.1.
This release includes a fix to an issue that prevented the Spark UI Environment tab from displaying correctly when running in Databricks Container Services.
To ignore invalid partitions when reading data, file-based data sources, such as Parquet, ORC, CSV, or JSON, can set the ignoreInvalidPartitionPaths data source option to true. For example: spark.read.format(“parquet”).option(“ignoreInvalidPartitionPaths”, “true”).load(…)`. You can also use the SQL configuration spark.sql.files.ignoreInvalidPartitionPaths. However, the data source option takes precedence over the SQL configuration. This setting is false by default.
[SPARK-48100][SQL] Fix issues in skipping nested structure fields not selected in schema
[SPARK-47463][SQL] Use V2Predicate to wrap expression with return type of boolean
[SPARK-48292][CORE] Revert [SPARK-39195][SQL] Spark OutputCommitCoordinator should abort stage when committed file not consistent with task status
[SPARK-48475][PYTHON] Optimize getjvm_function in PySpark.
[SPARK-48286] Fix analysis of column with exists default expression - Add user facing error
[SPARK-48481][SQL][SS] Do not apply OptimizeOneRowPlan against streaming Dataset
Revert “[SPARK-47406][SQL] Handle TIMESTAMP and DATETIME in MYSQLDialect”
[SPARK-48383][SS] Throw better error for mismatched partitions in startOffset option in Kafka
[SPARK-48503][14.3-15.3][SQL] Fix invalid scalar subqueries with group-by on non-equivalent columns that were incorrectly allowed
[SPARK-48445][SQL] Don’t inline UDFs with expensive children
[SPARK-48252][SQL] Update CommonExpressionRef when necessary
[SPARK-48273][master][SQL] Fix late rewrite of PlanWithUnresolvedIdentifier
[SPARK-48566][PYTHON] Fix bug where partition indices are incorrect when UDTF analyze() uses both select and partitionColumns
[SPARK-48556][SQL] Fix incorrect error message pointing to UNSUPPORTED_GROUPING_EXPRESSION
Operating system security updates.
Databricks Runtime 15.2
October 22, 2024
[SPARK-49905] Use dedicated ShuffleOrigin for stateful operator to prevent the shuffle to be modified from AQE
[SPARK-48843][15.3,15.2] Prevent infinite loop with BindParameters
[SPARK-49829] Revise the optimization on adding input to state store in stream-stream join (correctness fix)
[SPARK-49863][SQL] Fix NormalizeFloatingNumbers to preserve nullability of nested structs
[SPARK-49782][SQL] ResolveDataFrameDropColumns rule resolves UnresolvedAttribute with child output
[SPARK-46632][SQL] Fix subexpression elimination when equivalent ternary expressions have different children
Operating system security updates.
October 10, 2024
[BACKPORT][[SPARK-49474]]https://issues.apache.org/jira/browse/SPARK-49474)[SS] Classify Error class for FlatMapGroupsWithState user function error
[SPARK-49743][SQL] OptimizeCsvJsonExpr should not change schema fields when pruning GetArrayStructFields
[SPARK-49688][CONNECT] Fix a data race between interrupt and execute plan
Operating system security updates.
September 25, 2024
[SPARK-49000][SQL] Fix “select count(distinct 1) from t” where t is empty table by expanding RewriteDistinctAggregates
[SPARK-48719][SQL] Fix the calculation bug of RegrSlope & RegrIntercept when the first parameter is null
[SPARK-49458][CONNECT][PYTHON] Supply server-side session id via ReattachExecute
[SPARK-49628][SQL] ConstantFolding should copy stateful expression before evaluating
[SPARK-49492][CONNECT] Reattach attempted on inactive ExecutionHolder
Operating system security updates.
September 17, 2024
[SPARK-49336][CONNECT] Limit the nesting level when truncating a protobuf message
[SPARK-49526][CONNECT] Support Windows-style paths in ArtifactManager
[SPARK-49366][CONNECT] Treat Union node as leaf in dataframe column resolution
[SPARK-43242][CORE] Fix throw ‘Unexpected type of BlockId’ in shuffle corruption diagnose
[SPARK-49409][CONNECT] Adjust the default value of CONNECT_SESSION_PLAN_CACHE_SIZE
Operating system security updates.
August 29, 2024
[SPARK-49056][SQL] ErrorClassesJsonReader cannot handle null properly
[SPARK-48597][SQL] Introduce a marker for isStreaming property in text representation of logical plan
[SPARK-48862][PYTHON][CONNECT] Avoid calling
_proto_to_string
when INFO level is not enabled[SPARK-49263][CONNECT] Spark Connect python client: Consistently handle boolean Dataframe reader options
[SPARK-49146][SS] Move assertion errors related to watermark missing in append mode streaming queries to error framework
August 14, 2024
[SPARK-48941][SPARK-48970] Backport ML writer / reader fixes
[SPARK-48050][SS] Log logical plan at query start
[SPARK-48706][PYTHON] Python UDF in higher order functions should not throw internal error
[SPARK-48740][SQL] Catch missing window specification error early
[SPARK-49065][SQL] Rebasing in legacy formatters/parsers must support non JVM default time zones
[SPARK-49047][PYTHON][CONNECT] Truncate the message for logging
August 1, 2024
On serverless compute for notebooks and jobs, ANSI SQL mode is enabled by default. See Supported Spark configuration parameters.
On compute configured with shared access mode, Kafka batch reads and writes now have the same limitations enforced as those documented for Structured Streaming. See Streaming limitations and requirements for Unity Catalog shared access mode.
The output from a
SHOW CREATE TABLE
statement now includes any row filters or column masks defined on a materialized view or streaming table. See SHOW CREATE TABLE. To learn about row filters and column masks, see Filter sensitive table data using row filters and column masks.[SPARK-48705][PYTHON] Explicitly use worker_main when it starts with pyspark
[SPARK-48047][SQL] Reduce memory pressure of empty TreeNode tags
[SPARK-48810][CONNECT] Session stop() API should be idempotent and not fail if the session is already closed by the server
[SPARK-48873][SQL] Use UnsafeRow in JSON parser.
[SPARK-46957][CORE] Decommission migrated shuffle files should be able to cleanup from executor
[SPARK-48889][SS] testStream to unload state stores before finishing
[SPARK-48713][SQL] Add index range check for UnsafeRow.pointTo when baseObject is byte array
[SPARK-48896] [SPARK-48909] [SPARK-48883] Backport spark ML writer fixes
[SPARK-48544][SQL] Reduce memory pressure of empty TreeNode BitSets
[SPARK-48934][SS] Python datetime types converted incorrectly for setting timeout in applyInPandasWithState
[SPARK-48463] Make StringIndexer supporting nested input columns
Operating system security updates.
July 11, 2024
(Behavior change) DataFrames cached against Delta table sources are now invalidated if the source table is overwritten. This change means that all state changes to Delta tables now invalidate cached results. Use
.checkpoint()
to persist a table state throughout the lifetime of a DataFrame.The Snowflake JDBC Driver is updated to version 3.16.1.
This release includes a fix to an issue that prevented the Spark UI Environment tab from displaying correctly when running in Databricks Container Services.
On serverless notebooks and jobs, the ANSI SQL mode will be enabled by default and support short names
To ignore invalid partitions when reading data, file-based data sources, such as Parquet, ORC, CSV, or JSON, can set the ignoreInvalidPartitionPaths data source option to true. For example: spark.read.format(“parquet”).option(“ignoreInvalidPartitionPaths”, “true”).load(…)`. You can also use the SQL configuration spark.sql.files.ignoreInvalidPartitionPaths. However, the data source option takes precedence over the SQL configuration. This setting is false by default.
[SPARK-48273][SQL] Fix late rewrite of PlanWithUnresolvedIdentifier
[SPARK-48292][CORE] Revert [SPARK-39195][SQL] Spark OutputCommitCoordinator should abort stage when committed file not consistent with task status
[SPARK-48100][SQL] Fix issues in skipping nested structure fields not selected in schema
[SPARK-48286] Fix analysis of column with exists default expression - Add user facing error
[SPARK-48294][SQL] Handle lowercase in nestedTypeMissingElementTypeError
[SPARK-48556][SQL] Fix incorrect error message pointing to UNSUPPORTED_GROUPING_EXPRESSION
[SPARK-48648][PYTHON][CONNECT] Make SparkConnectClient.tags properly threadlocal
[SPARK-48503][SQL] Fix invalid scalar subqueries with group-by on non-equivalent columns that were incorrectly allowed
[SPARK-48252][SQL] Update CommonExpressionRef when necessary
[SPARK-48475][PYTHON] Optimize getjvm_function in PySpark.
[SPARK-48566][PYTHON] Fix bug where partition indices are incorrect when UDTF analyze() uses both select and partitionColumns
[SPARK-48481][SQL][SS] Do not apply OptimizeOneRowPlan against streaming Dataset
[SPARK-47463][SQL] Use V2Predicate to wrap expression with return type of boolean
[SPARK-48383][SS] Throw better error for mismatched partitions in startOffset option in Kafka
[SPARK-48445][SQL] Don’t inline UDFs with expensive children
Operating system security updates.
June 17, 2024
applyInPandasWithState()
is available on shared clusters.Fixes a bug where the rank-window optimization using Photon TopK incorrectly handled partitions with structs.
Fixed a bug in the try_divide() function where inputs containing decimals resulted in unexpected exceptions.
[SPARK-48197][SQL] Avoid assert error for invalid lambda function
[SPARK-48276][PYTHON][CONNECT] Add the missing
__repr__
method forSQLExpression
[SPARK-48014][SQL] Change the makeFromJava error in EvaluatePython to a user-facing error
[SPARK-48016][SQL] Fix a bug in try_divide function when with decimals
[SPARK-47986][CONNECT][PYTHON] Unable to create a new session when the default session is closed by the server
[SPARK-48173][SQL] CheckAnalysis should see the entire query plan
[SPARK-48056][CONNECT][PYTHON] Re-execute plan if a SESSION_NOT_FOUND error is raised and no partial response was received
[SPARK-48172][SQL] Fix escaping issues in JDBCDialects backport to 15.2
[SPARK-48105][SS] Fix the race condition between state store unloading and snapshotting
[SPARK-48288] Add source data type for connector cast expression
[SPARK-48310][PYTHON][CONNECT] Cached properties must return copies
[SPARK-48277] Improve error message for ErrorClassesJsonReader.getErrorMessage
[SPARK-47986][CONNECT][PYTHON] Unable to create a new session when the default session is closed by the server
Revert “[SPARK-47406][SQL] Handle TIMESTAMP and DATETIME in MYSQLDialect”
[SPARK-47994][SQL] Fix bug with CASE WHEN column filter push down in SQLServer
[SPARK-47764][CORE][SQL] Cleanup shuffle dependencies based on ShuffleCleanupMode
[SPARK-47921][CONNECT] Fix ExecuteJobTag creation in ExecuteHolder
[SPARK-48010][SQL] Avoid repeated calls to conf.resolver in resolveExpression
[SPARK-48146][SQL] Fix aggregate function in With expression child assertion
[SPARK-48180][SQL] Improve error when UDTF call with TABLE arg forgets parentheses around multiple PARTITION/ORDER BY exprs
Operating system security updates.
Databricks Runtime 14.3
See Databricks Runtime 14.3 LTS.
October 22, 2024
[SPARK-48843] Prevent infinite loop with BindParameters
[SPARK-49863][SQL] Fix NormalizeFloatingNumbers to preserve nullability of nested structs
[SPARK-49905] Use dedicated ShuffleOrigin for stateful operator to prevent the shuffle to be modified from AQE
[SPARK-46632][SQL] Fix subexpression elimination when equivalent ternary expressions have different children
[SPARK-49782][SQL] ResolveDataFrameDropColumns rule resolves UnresolvedAttribute with child output
[BACKPORT][[SPARK-49326]]https://issues.apache.org/jira/browse/SPARK-49326)[SS] Classify Error class for Foreach sink user function error
[SPARK-49829] Revise the optimization on adding input to state store in stream-stream join (correctness fix)
Operating system security updates.
October 10, 2024
[BACKPORT][[SPARK-49474]]https://issues.apache.org/jira/browse/SPARK-49474)[SS] Classify Error class for FlatMapGroupsWithState user function error
[SPARK-49743][SQL] OptimizeCsvJsonExpr should not change schema fields when pruning GetArrayStructFields
[SPARK-49688][CONNECT] Fix a data race between interrupt and execute plan
September 25, 2024
[SPARK-48810][CONNECT] Session stop() API should be idempotent and not fail if the session is already closed by the server
[SPARK-48719][SQL] Fix the calculation bug of `RegrS…
[SPARK-49000][SQL] Fix “select count(distinct 1) from t” where t is empty table by expanding RewriteDistinctAggregates
[SPARK-49628][SQL] ConstantFolding should copy stateful expression before evaluating
[SPARK-49492][CONNECT] Reattach attempted on inactive ExecutionHolder
Operating system security updates.
September 17, 2024
[SPARK-49336][CONNECT] Limit the nesting level when truncating a protobuf message
[SPARK-43242][CORE] Fix throw ‘Unexpected type of BlockId’ in shuffle corruption diagnose
[SPARK-48463][ML] Make Binarizer, Bucketizer, Vector Assembler, FeatureHasher, QuantizeDiscretizer, OnehotEncoder, StopWordsRemover, Imputer, Interactor supporting nested input columns
[SPARK-49526][CONNECT] Support Windows-style paths in ArtifactManager
[SPARK-49409][CONNECT] Adjust the default value of CONNECT_SESSION_PLAN_CACHE_SIZE
[SPARK-49366][CONNECT] Treat Union node as leaf in dataframe column resolution
August 29, 2024
[SPARK-49146][SS] Move assertion errors related to watermark missing in append mode streaming queries to error framework
[SPARK-48862][PYTHON][CONNECT] Avoid calling
_proto_to_string
when INFO level is not enabled[SPARK-49263][CONNECT] Spark Connect python client: Consistently handle boolean Dataframe reader options
August 14, 2024
[SPARK-48941][SPARK-48970] Backport ML writer / reader fixes
[SPARK-48706][PYTHON] Python UDF in higher order functions should not throw internal error
[SPARK-49056][SQL] ErrorClassesJsonReader cannot handle null properly
[SPARK-48597][SQL] Introduce a marker for isStreaming property in text representation of logical plan
[SPARK-49065][SQL] Rebasing in legacy formatters/parsers must support non JVM default time zones
[SPARK-48934][SS] Python datetime types converted incorrectly for setting timeout in applyInPandasWithState
August 1, 2024
This release includes a bug fix for the
ColumnVector
andColumnarArray
classes in the Spark Java interface. Previous to this fix, anArrayIndexOutOfBoundsException
might be thrown or incorrect data returned when an instance of one of these classes containednull
values.On serverless compute for notebooks and jobs, ANSI SQL mode is enabled by default. See Supported Spark configuration parameters.
On compute configured with shared access mode, Kafka batch reads and writes now have the same limitations enforced as those documented for Structured Streaming. See Streaming limitations and requirements for Unity Catalog shared access mode.
The output from a
SHOW CREATE TABLE
statement now includes any row filters or column masks defined on a materialized view or streaming table. See SHOW CREATE TABLE. To learn about row filters and column masks, see Filter sensitive table data using row filters and column masks.[SPARK-48896] [SPARK-48909] [SPARK-48883] Backport spark ML writer fixes
[SPARK-48889][SS] testStream to unload state stores before finishing
[SPARK-48705][PYTHON] Explicitly use worker_main when it starts with pyspark
[SPARK-48047][SQL] Reduce memory pressure of empty TreeNode tags
[SPARK-48544][SQL] Reduce memory pressure of empty TreeNode BitSets
[SPARK-46957][CORE] Decommission migrated shuffle files should be able to cleanup from executor
[SPARK-48463] Make StringIndexer supporting nested input columns
[SPARK-47202][PYTHON] Fix typo breaking datetimes with tzinfo
[SPARK-47713][SQL][CONNECT] Fix a self-join failure
Operating system security updates.
July 11, 2024
(Behavior change) DataFrames cached against Delta table sources are now invalidated if the source table is overwritten. This change means that all state changes to Delta tables now invalidate cached results. Use
.checkpoint()
to persist a table state throughout the lifetime of a DataFrame.The Snowflake JDBC Driver is updated to version 3.16.1.
This release includes a fix to an issue that prevented the Spark UI Environment tab from displaying correctly when running in Databricks Container Services.
On serverless compute for notebooks and jobs, ANSI SQL mode is enabled by default. See Supported Spark configuration parameters.
To ignore invalid partitions when reading data, file-based data sources, such as Parquet, ORC, CSV, or JSON, can set the ignoreInvalidPartitionPaths data source option to true. For example: spark.read.format(“parquet”).option(“ignoreInvalidPartitionPaths”, “true”).load(…). You can also use the SQL configuration spark.sql.files.ignoreInvalidPartitionPaths. However, the data source option takes precedence over the SQL configuration. This setting is false by default.
[SPARK-48648][PYTHON][CONNECT] Make SparkConnectClient.tags properly threadlocal
[SPARK-48445][SQL] Don’t inline UDFs with expensive children
[SPARK-48481][SQL][SS] Do not apply OptimizeOneRowPlan against streaming Dataset
[SPARK-48383][SS] Throw better error for mismatched partitions in startOffset option in Kafka
[SPARK-48503][SQL] Fix invalid scalar subqueries with group-by on non-equivalent columns that were incorrectly allowed
[SPARK-48100][SQL] Fix issues in skipping nested structure fields not selected in schema
[SPARK-48273][SQL] Fix late rewrite of PlanWithUnresolvedIdentifier
[SPARK-48252][SQL] Update CommonExpressionRef when necessary
[SPARK-48475][PYTHON] Optimize getjvm_function in PySpark.
[SPARK-48292][CORE] Revert [SPARK-39195][SQL] Spark OutputCommitCoordinator should abort stage when committed file not consistent with task status
Operating system security updates.
June 17, 2024
applyInPandasWithState()
is available on shared clusters.Fixes a bug where the rank-window optimization using Photon TopK incorrectly handled partitions with structs.
[SPARK-48310][PYTHON][CONNECT] Cached properties must return copies
[SPARK-48276][PYTHON][CONNECT] Add the missing
__repr__
method forSQLExpression
[SPARK-48294][SQL] Handle lowercase in nestedTypeMissingElementTypeError
Operating system security updates.
May 21, 2024
(Behavior change)
dbutils.widgets.getAll()
is now supported to get all widget values in a notebook.Fixed a bug in the try_divide() function where inputs containing decimals resulted in unexpected exceptions.
[SPARK-48056][CONNECT][PYTHON] Re-execute plan if a SESSION_NOT_FOUND error is raised and no partial response was received
[SPARK-48146][SQL] Fix aggregate function in With expression child assertion
[SPARK-47986][CONNECT][PYTHON] Unable to create a new session when the default session is closed by the server
[SPARK-48180][SQL] Improve error when UDTF call with TABLE arg forgets parentheses around multiple PARTITION/ORDER BY exprs
[SPARK-48016][SQL] Fix a bug in try_divide function when with decimals
[SPARK-48197][SQL] Avoid assert error for invalid lambda function
[SPARK-47994][SQL] Fix bug with CASE WHEN column filter push down in SQLServer
[SPARK-48173][SQL] CheckAnalysis should see the entire query plan
[SPARK-48105][SS] Fix the race condition between state store unloading and snapshotting
Operating system security updates.
May 9, 2024
(Behavior change)
applyInPandas
andmapInPandas
UDF types are now supported on shared access mode compute running Databricks Runtime 14.3 and above.[SPARK-47739][SQL] Register logical avro type
[SPARK-47941] [SS] [Connect] Propagate ForeachBatch worker initialization errors to users for PySpark
[SPARK-48010][SQL] Avoid repeated calls to conf.resolver in resolveExpression
[SPARK-48044][PYTHON][CONNECT] Cache
DataFrame.isStreaming
[SPARK-47956][SQL] Sanity check for unresolved LCA reference
[SPARK-47543][CONNECT][PYTHON] Inferring dict as Mapype from Pandas DataFrame to allow DataFrame creation
[SPARK-47819][CONNECT][Cherry-pick-14.3] Use asynchronous callback for execution cleanup
[SPARK-47764][CORE][SQL] Cleanup shuffle dependencies based on ShuffleCleanupMode
[SPARK-48018][SS] Fix null groupId causing missing param error when throwing KafkaException.couldNotReadOffsetRange
[SPARK-47839][SQL] Fix aggregate bug in RewriteWithExpression
[SPARK-47371] [SQL] XML: Ignore row tags found in CDATA
[SPARK-47895][SQL] group by all should be idempotent
[SPARK-47973][CORE] Log call site in SparkContext.stop() and later in SparkContext.assertNotStopped()
Operating system security updates.
April 25, 2024
[SPARK-47543][CONNECT][PYTHON] Inferring
dict
asMapType
from Pandas DataFrame to allow DataFrame creation[SPARK-47694][CONNECT] Make max message size configurable on the client side
[SPARK-47664][PYTHON][CONNECT][Cherry-pick-14.3] Validate the column name with cached schema
[SPARK-47862][PYTHON][CONNECT]Fix generation of proto files
Revert “[SPARK-47543][CONNECT][PYTHON] Inferring
dict
asMapType
from Pandas DataFrame to allow DataFrame creation”[SPARK-47704][SQL] JSON parsing fails with “java.lang.ClassCastException” when spark.sql.json.enablePartialResults is enabled
[SPARK-47812][CONNECT] Support Serialization of SparkSession for ForEachBatch worker
[SPARK-47818][CONNECT][Cherry-pick-14.3] Introduce plan cache in SparkConnectPlanner to improve performance of Analyze requests
[SPARK-47828][CONNECT][PYTHON]
DataFrameWriterV2.overwrite
fails with invalid planOperating system security updates.
April 11, 2024
(Behavior change) To ensure consistent behavior across compute types, PySpark UDFs on shared clusters now match the behavior of UDFs on no-isolation and assigned clusters. This update includes the following changes that might break existing code:
UDFs with a
string
return type no longer implicitly convert non-string
values intostring
values. Previously, UDFs with a return type ofstr
would wrap the return value with astr()
function regardless of the actual data type of the returned value.UDFs with
timestamp
return types no longer implicitly apply a conversion totimestamp
withtimezone
.The Spark cluster configurations
spark.databricks.sql.externalUDF.*
no longer apply to PySpark UDFs on shared clusters.The Spark cluster configuration
spark.databricks.safespark.externalUDF.plan.limit
no longer affects PySpark UDFs, removing the Public Preview limitation of 5 UDFs per query for PySpark UDFs.The Spark cluster configuration
spark.databricks.safespark.sandbox.size.default.mib
no longer applies to PySpark UDFs on shared clusters. Instead, available memory on the system is used. To limit the memory of PySpark UDFs, usespark.databricks.pyspark.udf.isolation.memoryLimit
with a minimum value of100m
.
The
TimestampNTZ
data type is now supported as a clustering column with liquid clustering. See Use liquid clustering for Delta tables.[SPARK-47511][SQL] Canonicalize With expressions by re-assigning IDs
[SPARK-47509][SQL] Block subquery expressions in lambda and higher-order functions
[SPARK-46990][SQL] Fix loading empty Avro files emitted by event-hubs
[SPARK-47638][PS][CONNECT] Skip column name validation in PS
Operating system security updates.
March 14, 2024
[SPARK-47135][SS] Implement error classes for Kafka data loss exceptions
[SPARK-47176][SQL] Have a ResolveAllExpressionsUpWithPruning helper function
[SPARK-47145][SQL] Pass table identifier to row data source scan exec for V2 strategy.
[SPARK-47044][SQL] Add executed query for JDBC external datasources to explain output
[SPARK-47167][SQL] Add concrete class for JDBC anonymous relation
[SPARK-47070] Fix invalid aggregation after subquery rewrite
[SPARK-47121][CORE] Avoid RejectedExecutionExceptions during StandaloneSchedulerBackend shutdown
Revert “[SPARK-46861][CORE] Avoid Deadlock in DAGScheduler”
[SPARK-47125][SQL] Return null if Univocity never triggers parsing
[SPARK-46999][SQL] ExpressionWithUnresolvedIdentifier should include other expressions in the expression tree
[SPARK-47129][CONNECT][SQL] Make
ResolveRelations
cache connect plan properly[SPARK-47241][SQL] Fix rule order issues for ExtractGenerator
[SPARK-47035][SS][CONNECT] Protocol for Client-Side Listener
Operating system security updates.
February 29, 2024
Fixed an issue where using a local collection as source in a MERGE command could result in the operation metric numSourceRows reporting double the correct number of rows.
Creating a schema with a defined location now requires the user to have SELECT and MODIFY privileges on ANY FILE.
[SPARK-47071][SQL] Inline With expression if it contains special expression
[SPARK-47059][SQL] Attach error context for ALTER COLUMN v1 command
[SPARK-46993][SQL] Fix constant folding for session variables
Operating system security updates.
January 3, 2024
[SPARK-46933] Add query execution time metric to connectors which use JDBCRDD.
[SPARK-46763] Fix assertion failure in ReplaceDeduplicateWithAggregate for duplicate attributes.
[SPARK-46954] XML: Wrap InputStreamReader with BufferedReader.
[SPARK-46655] Skip query context catching in
DataFrame
methods.[SPARK-44815] Cache df.schema to avoid extra RPC.
[SPARK-46952] XML: Limit size of corrupt record.
[SPARK-46794] Remove subqueries from LogicalRDD constraints.
[SPARK-46736] retain empty message field in protobuf connector.
[SPARK-45182] Ignore task completion from old stage after retrying parent-indeterminate stage as determined by checksum.
[SPARK-46414] Use prependBaseUri to render javascript imports.
[SPARK-46383] Reduce Driver Heap Usage by Reducing the Lifespan of
TaskInfo.accumulables()
.[SPARK-46861] Avoid Deadlock in DAGScheduler.
[SPARK-46954] XML: Optimize schema index lookup.
[SPARK-46676] dropDuplicatesWithinWatermark should not fail on canonicalization of the plan.
[SPARK-46644] Change add and merge in SQLMetric to use isZero.
[SPARK-46731] Manage state store provider instance by state data source - reader.
[SPARK-46677] Fix
dataframe["*"]
resolution.[SPARK-46610] Create table should throw exception when no value for a key in options.
[SPARK-46941] Can’t insert window group limit node for top-k computation if contains SizeBasedWindowFunction.
[SPARK-45433] Fix CSV/JSON schema inference when timestamps do not match specified timestampFormat.
[SPARK-46930] Add support for a custom prefix for Union type fields in Avro.
[SPARK-46227] Backport to 14.3.
[SPARK-46822] Respect spark.sql.legacy.charVarcharAsString when casting jdbc type to catalyst type in jdbc.
Operating system security updates.
Databricks Runtime 14.2
October 22, 2024
[SPARK-49782][SQL] ResolveDataFrameDropColumns rule resolves UnresolvedAttribute with child output
[SPARK-49905] Use dedicated ShuffleOrigin for stateful operator to prevent the shuffle to be modified from AQE
Operating system security updates.
October 10, 2024
[SPARK-49743][SQL] OptimizeCsvJsonExpr should not change schema fields when pruning GetArrayStructFields
[BACKPORT][[SPARK-49474]]https://issues.apache.org/jira/browse/SPARK-49474)[SS] Classify Error class for FlatMapGroupsWithState user function error
September 25, 2024
[SPARK-48719][SQL] Fix the calculation bug of `RegrS…
[SPARK-49628][SQL] ConstantFolding should copy stateful expression before evaluating
[SPARK-49000][SQL] Fix “select count(distinct 1) from t” where t is empty table by expanding RewriteDistinctAggregates
[SPARK-43242][CORE] Fix throw ‘Unexpected type of BlockId’ in shuffle corruption diagnose
[SPARK-46601] [CORE] Fix log error in handleStatusMessage
Operating system security updates.
September 17, 2024
[SPARK-49526][CONNECT] Support Windows-style paths in ArtifactManager
August 29, 2024
[SPARK-49263][CONNECT] Spark Connect python client: Consistently handle boolean Dataframe reader options
[SPARK-49146][SS] Move assertion errors related to watermark missing in append mode streaming queries to error framework
[SPARK-49056][SQL] ErrorClassesJsonReader cannot handle null properly
August 14, 2024
[SPARK-48050][SS] Log logical plan at query start
[SPARK-48597][SQL] Introduce a marker for isStreaming property in text representation of logical plan
[SPARK-49065][SQL] Rebasing in legacy formatters/parsers must support non JVM default time zones
[SPARK-48706][PYTHON] Python UDF in higher order functions should not throw internal error
August 1, 2024
This release includes a bug fix for the
ColumnVector
andColumnarArray
classes in the Spark Java interface. Previous to this fix, anArrayIndexOutOfBoundsException
might be thrown or incorrect data returned when an instance of one of these classes containednull
values.The output from a
SHOW CREATE TABLE
statement now includes any row filters or column masks defined on a materialized view or streaming table. See SHOW CREATE TABLE. To learn about row filters and column masks, see Filter sensitive table data using row filters and column masks.[SPARK-47202][PYTHON] Fix typo breaking datetimes with tzinfo
[SPARK-48705][PYTHON] Explicitly use worker_main when it starts with pyspark
Operating system security updates.
July 11, 2024
(Behavior change) DataFrames cached against Delta table sources are now invalidated if the source table is overwritten. This change means that all state changes to Delta tables now invalidate cached results. Use
.checkpoint()
to persist a table state throughout the lifetime of a DataFrame.The Snowflake JDBC Driver is updated to version 3.16.1
This release includes a fix to an issue that prevented the Spark UI Environment tab from displaying correctly when running in Databricks Container Services.
[SPARK-48292][CORE] Revert [SPARK-39195][SQL] Spark OutputCommitCoordinator should abort stage when committed file not consistent with task status
[SPARK-48273][SQL] Fix late rewrite of PlanWithUnresolvedIdentifier
[SPARK-48503][SQL] Fix invalid scalar subqueries with group-by on non-equivalent columns that were incorrectly allowed
[SPARK-48481][SQL][SS] Do not apply OptimizeOneRowPlan against streaming Dataset
[SPARK-48475][PYTHON] Optimize getjvm_function in PySpark.
[SPARK-48100][SQL] Fix issues in skipping nested structure fields not selected in schema
[SPARK-48445][SQL] Don’t inline UDFs with expensive children
[SPARK-48383][SS] Throw better error for mismatched partitions in startOffset option in Kafka
Operating system security updates.
June 17, 2024
Fixes a bug where the rank-window optimization using Photon TopK incorrectly handled partitions with structs.
[SPARK-48276][PYTHON][CONNECT] Add the missing
__repr__
method forSQLExpression
[SPARK-48277] Improve error message for ErrorClassesJsonReader.getErrorMessage
Operating system security updates.
May 21, 2024
(Behavior change)
dbutils.widgets.getAll()
is now supported to get all widget values in a notebook.[SPARK-48173][SQL] CheckAnalysis should see the entire query plan
[SPARK-48197][SQL] Avoid assert error for invalid lambda function
[SPARK-47994][SQL] Fix bug with CASE WHEN column filter push down in SQLServer
[SPARK-48105][SS] Fix the race condition between state store unloading and snapshotting
Operating system security updates.
May 9, 2024
[SPARK-48044][PYTHON][CONNECT] Cache
DataFrame.isStreaming
[SPARK-47956][SQL] Sanity check for unresolved LCA reference
[SPARK-47371] [SQL] XML: Ignore row tags found in CDATA
[SPARK-47812][CONNECT] Support Serialization of SparkSession for ForEachBatch worker
[SPARK-47895][SQL] group by all should be idempotent
[SPARK-47973][CORE] Log call site in SparkContext.stop() and later in SparkContext.assertNotStopped()
Operating system security updates.
April 25, 2024
[SPARK-47704][SQL] JSON parsing fails with “java.lang.ClassCastException” when spark.sql.json.enablePartialResults is enabled
[SPARK-47828][CONNECT][PYTHON]
DataFrameWriterV2.overwrite
fails with invalid planOperating system security updates.
April 11, 2024
[SPARK-47309][SQL][XML] Add schema inference unit tests
[SPARK-46990][SQL] Fix loading empty Avro files emitted by event-hubs
[SPARK-47638][PS][CONNECT] Skip column name validation in PS
[SPARK-47509][SQL] Block subquery expressions in lambda and higher-order functions
[SPARK-38708][SQL] Upgrade Hive Metastore Client to the 3.1.3 for Hive 3.1
Operating system security updates.
April 1, 2024
[SPARK-47322][PYTHON][CONNECT] Make
withColumnsRenamed
column names duplication handling consistent withwithColumnRenamed
[SPARK-47385] Fix tuple encoders with Option inputs.
[SPARK-47070] Fix invalid aggregation after subquery rewrite
[SPARK-47218] [SQL] XML: Changed SchemaOfXml to fail on DROPMALFORMED mode
[SPARK-47305][SQL] Fix PruneFilters to tag the isStreaming flag of LocalRelation correctly when the plan has both batch and streaming
[SPARK-47218][SQL] XML: Ignore commented row tags in XML tokenizer
Revert “[SPARK-46861][CORE] Avoid Deadlock in DAGScheduler”
[SPARK-47300][SQL]
quoteIfNeeded
should quote identifier starts with digits[SPARK-47368][SQL] Remove inferTimestampNTZ config check in ParquetRowConverter
Operating system security updates.
March 14, 2024
[SPARK-47035][SS][CONNECT] Protocol for Client-Side Listener
[SPARK-47121][CORE] Avoid RejectedExecutionExceptions during StandaloneSchedulerBackend shutdown
[SPARK-47145][SQL] Pass table identifier to row data source scan exec for V2 strategy.
[SPARK-47176][SQL] Have a ResolveAllExpressionsUpWithPruning helper function
[SPARK-47167][SQL] Add concrete class for JDBC anonymous relation
[SPARK-47129][CONNECT][SQL] Make
ResolveRelations
cache connect plan properly[SPARK-47044][SQL] Add executed query for JDBC external datasources to explain output
Operating system security updates.
February 29, 2024
Fixed an issue where using a local collection as source in a MERGE command could result in the operation metric numSourceRows reporting double the correct number of rows.
Creating a schema with a defined location now requires the user to have SELECT and MODIFY privileges on ANY FILE.
You can now ingest XML files using Autoloader, read_files, COPY INTO, DLT, and DBSQL. XML file support can automatically infer and evolve schema, rescue data with type mismatches, validate XML using XSD, support SQL expressions like from_xml, schema_of_xml and to_xml. See XML file support for more details. If you had previously been using the external spark-xml package, please see here for migration guidance.
[SPARK-46954][SQL] XML: Wrap InputStreamReader with BufferedReader
[SPARK-46630][SQL] XML: Validate XML element name on write
[SPARK-46248][SQL] XML: Support for ignoreCorruptFiles and ignoreMissingFiles options
[SPARK-46954][SQL] XML: Optimize schema index lookup
[SPARK-47059][SQL] Attach error context for ALTER COLUMN v1 command
[SPARK-46993][SQL] Fix constant folding for session variables
February 8, 2024
Change data feed (CDF) queries on Unity Catalog materialized views are not supported, and attempting to run a CDF query with a Unity Catalog materialized view returns an error. Unity Catalog streaming tables support CDF queries on non-
APPLY CHANGES
tables in Databricks Runtime 14.1 and later. CDF queries are not supported with Unity Catalog streaming tables in Databricks Runtime 14.0 and earlier.[SPARK-46930] Add support for a custom prefix for Union type fields in Avro.
[SPARK-46822] Respect spark.sql.legacy.charVarcharAsString when casting jdbc type to catalyst type in jdbc.
[SPARK-46952] XML: Limit size of corrupt record.
[SPARK-46644] Change add and merge in SQLMetric to use isZero.
[SPARK-46861] Avoid Deadlock in DAGScheduler.
[SPARK-46794] Remove subqueries from LogicalRDD constraints.
[SPARK-46941] Can’t insert window group limit node for top-k computation if contains SizeBasedWindowFunction.
[SPARK-46933] Add query execution time metric to connectors which use JDBCRDD.
Operating system security updates.
January 31, 2024
[SPARK-46382] XML: Update doc for
ignoreSurroundingSpaces
.[SPARK-46382] XML: Capture values interspersed between elements.
[SPARK-46763] Fix assertion failure in ReplaceDeduplicateWithAggregate for duplicate attributes.
Revert [SPARK-46769] Refine timestamp related schema inference.
[SPARK-46677] Fix
dataframe["*"]
resolution.[SPARK-46382] XML: Default ignoreSurroundingSpaces to true.
[SPARK-46633] Fix Avro reader to handle zero-length blocks.
[SPARK-45964] Remove private sql accessor in XML and JSON package under catalyst package.
[SPARK-46581] Update comment on isZero in AccumulatorV2.
[SPARK-45912] Enhancement of XSDToSchema API: Change to HDFS API for cloud storage accessibility.
[SPARK-45182] Ignore task completion from old stage after retrying parent-indeterminate stage as determined by checksum.
[SPARK-46660] ReattachExecute requests updates aliveness of SessionHolder.
[SPARK-46610] Create table should throw exception when no value for a key in options.
[SPARK-46383] Reduce Driver Heap Usage by Reducing the Lifespan of
TaskInfo.accumulables()
.[SPARK-46769] Refine timestamp related schema inference.
[SPARK-46684] Fix CoGroup.applyInPandas/Arrow to pass arguments properly.
[SPARK-46676] dropDuplicatesWithinWatermark should not fail on canonicalization of the plan.
[SPARK-45962] Remove
treatEmptyValuesAsNulls
and usenullValue
option instead in XML.[SPARK-46541] Fix the ambiguous column reference in self join.
[SPARK-46599] XML: Use TypeCoercion.findTightestCommonType for compatibility check.
Operating system security updates.
January 17, 2024
The
shuffle
node of the explain plan returned by a Photon query is updated to add thecausedBroadcastJoinBuildOOM=true
flag when an out-of-memory error occurs during a shuffle that is part of a broadcast join.To avoid increased latency when communicating over TLSv1.3, this maintenance release includes a patch to the JDK 8 installation to fix JDK bug JDK-8293562.
[SPARK-46261]
DataFrame.withColumnsRenamed
should keep the dict/map ordering.[SPARK-46538] Fix the ambiguous column reference issue in
ALSModel.transform
.[SPARK-46145] spark.catalog.listTables does not throw exception when the table or view is not found.
[SPARK-46484] Make
resolveOperators
helper functions keep the plan id.[SPARK-46394] Fix spark.catalog.listDatabases() issues on schemas with special characters when
spark.sql.legacy.keepCommandOutputSchema
set to true.[SPARK-46609] Avoid exponential explosion in PartitioningPreservingUnaryExecNode.
[SPARK-46446] Disable subqueries with correlated OFFSET to fix correctness bug.
[SPARK-46152] XML: Add DecimalType support in XML schema inference.
[SPARK-46602] Propagate
allowExisting
in view creation when the view/table does not exists.[SPARK-45814] Make ArrowConverters.createEmptyArrowBatch call close() to avoid memory leak.
[SPARK-46058] Add separate flag for privateKeyPassword.
[SPARK-46132] Support key password for JKS keys for RPC SSL.
[SPARK-46600] Move shared code between SqlConf and SqlApiConf to SqlApiConfHelper.
[SPARK-46478] Revert SPARK-43049 to use oracle varchar(255) for string.
[SPARK-46417] Do not fail when calling hive.getTable and throwException is false.
[SPARK-46153] XML: Add TimestampNTZType support.
[SPARK-46056][BACKPORT] Fix Parquet vectorized read NPE with byteArrayDecimalType default value.
[SPARK-46466] Vectorized parquet reader should never do rebase for timestamp ntz.
[SPARK-46260]
DataFrame.withColumnsRenamed
should respect the dict ordering.[SPARK-46036] Removing error-class from raise_error function.
[SPARK-46294] Clean up semantics of init vs zero value.
[SPARK-46173] Skipping trimAll call during date parsing.
[SPARK-46250] Deflake test_parity_listener.
[SPARK-46587] XML: Fix XSD big integer conversion.
[SPARK-46396] Timestamp inference should not throw exception.
[SPARK-46241] Fix error handling routine so it wouldn’t fall into infinite recursion.
[SPARK-46355] XML: Close InputStreamReader on read completion.
[SPARK-46370] Fix bug when querying from table after changing column defaults.
[SPARK-46265] Assertions in AddArtifact RPC make the connect client incompatible with older clusters.
[SPARK-46308] Forbid recursive error handling.
[SPARK-46337] Make
CTESubstitution
retain thePLAN_ID_TAG
.
December 14, 2023
[SPARK-46141] Change default for spark.sql.legacy.ctePrecedencePolicy to CORRECTED.
[SPARK-45730] Make ReloadingX509TrustManagerSuite less flaky.
[SPARK-45852] Gracefully deal with recursion error during logging.
[SPARK-45808] Better error handling for SQL Exceptions.
[SPARK-45920] group by ordinal should be idempotent.
Revert “[SPARK-45649] Unify the prepare framework for
OffsetWindowFunctionFrame
”.[SPARK-45733] Support multiple retry policies.
[SPARK-45509] Fix df column reference behavior for Spark Connect.
[SPARK-45655] Allow non-deterministic expressions inside AggregateFunctions in CollectMetrics.
[SPARK-45905] Least common type between decimal types should retain integral digits first.
[SPARK-45136] Enhance ClosureCleaner with Ammonite support.
[SPARK-46255] Support complex type -> string conversion.
[SPARK-45859] Make UDF objects in ml.functions lazy.
[SPARK-46028] Make
Column.__getitem__
accept input column.[SPARK-45798] Assert server-side session ID.
[SPARK-45892] Refactor optimizer plan validation to decouple
validateSchemaOutput
andvalidateExprIdUniqueness
.[SPARK-45844] Implement case-insensitivity for XML.
[SPARK-45770] Introduce plan
DataFrameDropColumns
forDataframe.drop
.[SPARK-44790] XML: to_xml implementation and bindings for python, connect and SQL.
[SPARK-45851] Support multiple policies in scala client.
Operating system security updates.
November 29, 2023
Installed a new package,
pyarrow-hotfix
to remediate a PyArrow RCE vulnerability.Fixed an issue where escaped underscores in
getColumns
operations originating from JDBC or ODBC clients were wrongly interpreted as wildcards.[SPARK-45730] Improved time constraints for
ReloadingX509TrustManagerSuite
.[SPARK-45852] The Python client for Spark Connect now catches recursion errors during text conversion.
[SPARK-45808] Improved error handling for SQL exceptions.
[SPARK-45920]
GROUP BY
ordinal is doesn’t replace the ordinal.Revert [SPARK-45649].
[SPARK-45733] Added support for multiple retry policies.
[SPARK-45509] Fixed
df
column reference behavior for Spark Connect.[SPARK-45655] Allow non-deterministic expressions inside
AggregateFunctions
inCollectMetrics
.[SPARK-45905] The least common type between decimal types now retain integral digits first.
[SPARK-45136] Enhance
ClosureCleaner
with Ammonite support.[SPARK-45859] Made UDF objects in
ml.functions
lazy.[SPARK-46028]
Column.__getitem__
accepts input columns.[SPARK-45798] Assert server-side session ID.
[SPARK-45892] Refactor optimizer plan validation to decouple
validateSchemaOutput
andvalidateExprIdUniqueness
.[SPARK-45844] Implement case-insensitivity for XML.
[SPARK-45770] Fixed column resolution with
DataFrameDropColumns
forDataframe.drop
.[SPARK-44790] Added
to_xml
implementation and bindings for Python, Spark Connect, and SQL.[SPARK-45851] Added support for multiple policies in the Scala client.
Operating system security updates.
Databricks Runtime 14.1
October 22, 2024
[SPARK-49782][SQL] ResolveDataFrameDropColumns rule resolves UnresolvedAttribute with child output
[SPARK-49905] Use dedicated ShuffleOrigin for stateful operator to prevent the shuffle to be modified from AQE
October 10, 2024
[BACKPORT][[SPARK-49474]]https://issues.apache.org/jira/browse/SPARK-49474)[SS] Classify Error class for FlatMapGroupsWithState user function error
[SPARK-49743][SQL] OptimizeCsvJsonExpr should not change schema fields when pruning GetArrayStructFields
Operating system security updates.
September 25, 2024
[SPARK-49628][SQL] ConstantFolding should copy stateful expression before evaluating
[SPARK-43242][CORE] Fix throw ‘Unexpected type of BlockId’ in shuffle corruption diagnose
[SPARK-48719][SQL] Fix the calculation bug of `RegrS…
[SPARK-49000][SQL] Fix “select count(distinct 1) from t” where t is empty table by expanding RewriteDistinctAggregates
[SPARK-46601] [CORE] Fix log error in handleStatusMessage
Operating system security updates.
September 17, 2024
[SPARK-49526][CONNECT] Support Windows-style paths in ArtifactManager
Operating system security updates.
August 29, 2024
[SPARK-49263][CONNECT] Spark Connect python client: Consistently handle boolean Dataframe reader options
[SPARK-49056][SQL] ErrorClassesJsonReader cannot handle null properly
August 14, 2024
[SPARK-48706][PYTHON] Python UDF in higher order functions should not throw internal error
[SPARK-48597][SQL] Introduce a marker for isStreaming property in text representation of logical plan
[SPARK-49065][SQL] Rebasing in legacy formatters/parsers must support non JVM default time zones
[SPARK-48050][SS] Log logical plan at query start
August 1, 2024
This release includes a bug fix for the
ColumnVector
andColumnarArray
classes in the Spark Java interface. Previous to this fix, anArrayIndexOutOfBoundsException
might be thrown or incorrect data returned when an instance of one of these classes containednull
values.The output from a
SHOW CREATE TABLE
statement now includes any row filters or column masks defined on a materialized view or streaming table. See SHOW CREATE TABLE. To learn about row filters and column masks, see Filter sensitive table data using row filters and column masks.[SPARK-48705][PYTHON] Explicitly use worker_main when it starts with pyspark
[SPARK-47202][PYTHON] Fix typo breaking datetimes with tzinfo
Operating system security updates.
July 11, 2024
(Behavior change) DataFrames cached against Delta table sources are now invalidated if the source table is overwritten. This change means that all state changes to Delta tables now invalidate cached results. Use
.checkpoint()
to persist a table state throughout the lifetime of a DataFrame.This release includes a fix to an issue that prevented the Spark UI Environment tab from displaying correctly when running in Databricks Container Services.
[SPARK-48475][PYTHON] Optimize getjvm_function in PySpark.
[SPARK-48445][SQL] Don’t inline UDFs with expensive children
[SPARK-48481][SQL][SS] Do not apply OptimizeOneRowPlan against streaming Dataset
[SPARK-48292][CORE] Revert [SPARK-39195][SQL] Spark OutputCommitCoordinator should abort stage when committed file not consistent with task status
[SPARK-48503][SQL] Fix invalid scalar subqueries with group-by on non-equivalent columns that were incorrectly allowed
[SPARK-48273][SQL] Fix late rewrite of PlanWithUnresolvedIdentifier
[SPARK-48100][SQL] Fix issues in skipping nested structure fields not selected in schema
[SPARK-48383][SS] Throw better error for mismatched partitions in startOffset option in Kafka
Operating system security updates.
June 17, 2024
Fixes a bug where the rank-window optimization using Photon TopK incorrectly handled partitions with structs.
[SPARK-48276][PYTHON][CONNECT] Add the missing
__repr__
method forSQLExpression
[SPARK-48277] Improve error message for ErrorClassesJsonReader.getErrorMessage
Operating system security updates.
May 21, 2024
(Behavior change)
dbutils.widgets.getAll()
is now supported to get all widget values in a notebook.[SPARK-47994][SQL] Fix bug with CASE WHEN column filter push down in SQLServer
[SPARK-48105][SS] Fix the race condition between state store unloading and snapshotting
[SPARK-48173][SQL] CheckAnalysis should see the entire query plan
Operating system security updates.
May 9, 2024
[SPARK-47371] [SQL] XML: Ignore row tags found in CDATA
[SPARK-47895][SQL] group by all should be idempotent
[SPARK-47956][SQL] Sanity check for unresolved LCA reference
[SPARK-48044][PYTHON][CONNECT] Cache
DataFrame.isStreaming
[SPARK-47973][CORE] Log call site in SparkContext.stop() and later in SparkContext.assertNotStopped()
Operating system security updates.
April 25, 2024
[SPARK-47704][SQL] JSON parsing fails with “java.lang.ClassCastException” when spark.sql.json.enablePartialResults is enabled
[SPARK-47828][CONNECT][PYTHON]
DataFrameWriterV2.overwrite
fails with invalid planOperating system security updates.
April 11, 2024
[SPARK-47638][PS][CONNECT] Skip column name validation in PS
[SPARK-38708][SQL] Upgrade Hive Metastore Client to the 3.1.3 for Hive 3.1
[SPARK-47309][SQL][XML] Add schema inference unit tests
[SPARK-47509][SQL] Block subquery expressions in lambda and higher-order functions
[SPARK-46990][SQL] Fix loading empty Avro files emitted by event-hubs
Operating system security updates.
April 1, 2024
[SPARK-47305][SQL] Fix PruneFilters to tag the isStreaming flag of LocalRelation correctly when the plan has both batch and streaming
[SPARK-47218][SQL] XML: Ignore commented row tags in XML tokenizer
[SPARK-47300][SQL]
quoteIfNeeded
should quote identifier starts with digits[SPARK-47368][SQL] Remove inferTimestampNTZ config check in ParquetRowConverter
[SPARK-47070] Fix invalid aggregation after subquery rewrite
[SPARK-47322][PYTHON][CONNECT] Make
withColumnsRenamed
column names duplication handling consistent withwithColumnRenamed
[SPARK-47300] Fix for DecomposerSuite
[SPARK-47218] [SQL] XML: Changed SchemaOfXml to fail on DROPMALFORMED mode
[SPARK-47385] Fix tuple encoders with Option inputs.
Operating system security updates.
March 14, 2024
[SPARK-47176][SQL] Have a ResolveAllExpressionsUpWithPruning helper function
[SPARK-47145][SQL] Pass table identifier to row data source scan exec for V2 strategy.
[SPARK-47167][SQL] Add concrete class for JDBC anonymous relation
[SPARK-47129][CONNECT][SQL] Make
ResolveRelations
cache connect plan properlyRevert “[SPARK-46861][CORE] Avoid Deadlock in DAGScheduler”
[SPARK-47044][SQL] Add executed query for JDBC external datasources to explain output
Operating system security updates.
February 29, 2024
Fixed an issue where using a local collection as source in a MERGE command could result in the operation metric numSourceRows reporting double the correct number of rows.
Creating a schema with a defined location now requires the user to have SELECT and MODIFY privileges on ANY FILE.
You can now ingest XML files using Autoloader, read_files, COPY INTO, DLT, and DBSQL. XML file support can automatically infer and evolve schema, rescue data with type mismatches, validate XML using XSD, support SQL expressions like from_xml, schema_of_xml and to_xml. See XML file support for more details. If you had previously been using the external spark-xml package, please see here for migration guidance.
[SPARK-46248][SQL] XML: Support for ignoreCorruptFiles and ignoreMissingFiles options
[SPARK-47059][SQL] Attach error context for ALTER COLUMN v1 command
[SPARK-46954][SQL] XML: Wrap InputStreamReader with BufferedReader
[SPARK-46954][SQL] XML: Optimize schema index lookup
[SPARK-46630][SQL] XML: Validate XML element name on write
Operating system security updates.
February 8, 2024
Change data feed (CDF) queries on Unity Catalog materialized views are not supported, and attempting to run a CDF query with a Unity Catalog materialized view returns an error. Unity Catalog streaming tables support CDF queries on non-
APPLY CHANGES
tables in Databricks Runtime 14.1 and later. CDF queries are not supported with Unity Catalog streaming tables in Databricks Runtime 14.0 and earlier.[SPARK-46952] XML: Limit size of corrupt record.
[SPARK-45182] Ignore task completion from old stage after retrying parent-indeterminate stage as determined by checksum.
[SPARK-46794] Remove subqueries from LogicalRDD constraints.
[SPARK-46933] Add query execution time metric to connectors which use JDBCRDD.
[SPARK-46861] Avoid Deadlock in DAGScheduler.
[SPARK-45582] Ensure that store instance is not used after calling commit within output mode streaming aggregation.
[SPARK-46930] Add support for a custom prefix for Union type fields in Avro.
[SPARK-46941] Can’t insert window group limit node for top-k computation if contains SizeBasedWindowFunction.
[SPARK-46396] Timestamp inference should not throw exception.
[SPARK-46822] Respect spark.sql.legacy.charVarcharAsString when casting jdbc type to catalyst type in jdbc.
[SPARK-45957] Avoid generating execution plan for non-executable commands.
Operating system security updates.
January 31, 2024
[SPARK-46684] Fix CoGroup.applyInPandas/Arrow to pass arguments properly.
[SPARK-46763] Fix assertion failure in ReplaceDeduplicateWithAggregate for duplicate attributes.
[SPARK-45498] Followup: Ignore task completion from old stage attempts.
[SPARK-46382] XML: Update doc for
ignoreSurroundingSpaces
.[SPARK-46383] Reduce Driver Heap Usage by Reducing the Lifespan of
TaskInfo.accumulables()
.[SPARK-46382] XML: Default ignoreSurroundingSpaces to true.
[SPARK-46677] Fix
dataframe["*"]
resolution.[SPARK-46676] dropDuplicatesWithinWatermark should not fail on canonicalization of the plan.
[SPARK-46633] Fix Avro reader to handle zero-length blocks.
[SPARK-45912] Enhancement of XSDToSchema API: Change to HDFS API for cloud storage accessibility.
[SPARK-46599] XML: Use TypeCoercion.findTightestCommonType for compatibility check.
[SPARK-46382] XML: Capture values interspersed between elements.
[SPARK-46769] Refine timestamp related schema inference.
[SPARK-46610] Create table should throw exception when no value for a key in options.
[SPARK-45964] Remove private sql accessor in XML and JSON package under catalyst package.
Revert [SPARK-46769] Refine timestamp related schema inference.
[SPARK-45962] Remove
treatEmptyValuesAsNulls
and usenullValue
option instead in XML.[SPARK-46541] Fix the ambiguous column reference in self join.
Operating system security updates.
January 17, 2024
The
shuffle
node of the explain plan returned by a Photon query is updated to add thecausedBroadcastJoinBuildOOM=true
flag when an out-of-memory error occurs during a shuffle that is part of a broadcast join.To avoid increased latency when communicating over TLSv1.3, this maintenance release includes a patch to the JDK 8 installation to fix JDK bug JDK-8293562.
[SPARK-46538] Fix the ambiguous column reference issue in
ALSModel.transform
.[SPARK-46417] Do not fail when calling hive.getTable and throwException is false.
[SPARK-46484] Make
resolveOperators
helper functions keep the plan id.[SPARK-46153] XML: Add TimestampNTZType support.
[SPARK-46152] XML: Add DecimalType support in XML schema inference.
[SPARK-46145] spark.catalog.listTables does not throw exception when the table or view is not found.
[SPARK-46478] Revert SPARK-43049 to use oracle varchar(255) for string.
[SPARK-46394] Fix spark.catalog.listDatabases() issues on schemas with special characters when
spark.sql.legacy.keepCommandOutputSchema
set to true.[SPARK-46337] Make
CTESubstitution
retain thePLAN_ID_TAG
.[SPARK-46466] Vectorized parquet reader should never do rebase for timestamp ntz.
[SPARK-46587] XML: Fix XSD big integer conversion.
[SPARK-45814] Make ArrowConverters.createEmptyArrowBatch call close() to avoid memory leak.
[SPARK-46132] Support key password for JKS keys for RPC SSL.
[SPARK-46602] Propagate
allowExisting
in view creation when the view/table does not exists.[SPARK-46173] Skipping trimAll call during date parsing.
[SPARK-46355] XML: Close InputStreamReader on read completion.
[SPARK-46600] Move shared code between SqlConf and SqlApiConf to SqlApiConfHelper.
[SPARK-46261]
DataFrame.withColumnsRenamed
should keep the dict/map ordering.[SPARK-46056] Fix Parquet vectorized read NPE with byteArrayDecimalType default value.
[SPARK-46260]
DataFrame.withColumnsRenamed
should respect the dict ordering.[SPARK-46250] Deflake test_parity_listener.
[SPARK-46370] Fix bug when querying from table after changing column defaults.
[SPARK-46609] Avoid exponential explosion in PartitioningPreservingUnaryExecNode.
[SPARK-46058] Add separate flag for privateKeyPassword.
December 14, 2023
Fixed an issue where escaped underscores in getColumns operations originating from JDBC or ODBC clients were handled incorrectly and interpreted as wildcards.
[SPARK-45509] Fix df column reference behavior for Spark Connect.
[SPARK-45844] Implement case-insensitivity for XML.
[SPARK-46141] Change default for spark.sql.legacy.ctePrecedencePolicy to CORRECTED.
[SPARK-46028] Make
Column.__getitem__
accept input column.[SPARK-46255] Support complex type -> string conversion.
[SPARK-45655] Allow non-deterministic expressions inside AggregateFunctions in CollectMetrics.
[SPARK-45433] Fix CSV/JSON schema inference when timestamps do not match specified timestampFormat.
[SPARK-45316] Add new parameters
ignoreCorruptFiles
/ignoreMissingFiles
toHadoopRDD
andNewHadoopRDD
.[SPARK-45852] Gracefully deal with recursion error during logging.
[SPARK-45920] group by ordinal should be idempotent.
Operating system security updates.
November 29, 2023
Installed a new package,
pyarrow-hotfix
to remediate a PyArrow RCE vulnerability.Fixed an issue where escaped underscores in
getColumns
operations originating from JDBC or ODBC clients were wrongly interpreted as wildcards.When ingesting CSV data using Auto Loader or Streaming Tables, large CSV files are now splittable and can be processed in parallel during both schema inference and data processing.
[SPARK-45892] Refactor optimizer plan validation to decouple
validateSchemaOutput
andvalidateExprIdUniqueness
.[SPARK-45620] APIs related to Python UDF now use camelCase.
[SPARK-44790] Added
to_xml
implementation and bindings for Python, Spark Connect, and SQL.[SPARK-45770] Fixed column resolution with
DataFrameDropColumns
forDataframe.drop
.[SPARK-45859] Made UDF objects in
ml.functions
lazy.[SPARK-45730] Improved time constraints for
ReloadingX509TrustManagerSuite
.[SPARK-44784] Made SBT testing hermetic.
Operating system security updates.
November 10, 2023
[SPARK-45545]
SparkTransportConf
inheritsSSLOptions
upon creation.[SPARK-45250] Added support for stage-level task resource profile for yarn clusters when dynamic allocation is turned off.
[SPARK-44753] Added XML DataFrame reader and writer for PySpark SQL.
[SPARK-45396] Added a doc entry for
PySpark.ml.connect
module.[SPARK-45584] Fixed subquery run failure with
TakeOrderedAndProjectExec
.[SPARK-45541] Added SSLFactory.
[SPARK-45577] Fixed
UserDefinedPythonTableFunctionAnalyzeRunner
to pass folded values from named arguments.[SPARK-45562] Made ‘rowTag’ a required option.
[SPARK-45427] Added RPC SSL settings to
SSLOptions
andSparkTransportConf
.[SPARK-43380] Fixed slowdown in Avro read.
[SPARK-45430]
FramelessOffsetWindowFunction
no longer fails whenIGNORE NULLS
andoffset > rowCount
.[SPARK-45429] Added helper classes for SSL RPC communication.
[SPARK-45386] Fixed an issue where
StorageLevel.NONE
would incorrectly return 0.[SPARK-44219] Added per-rule validation checks for optimization rewrites.
[SPARK-45543] Fixed an issue where
InferWindowGroupLimit
caused an issue if the other window functions didn’t have the same window frame as the rank-like functions.Operating system security updates.
September 27, 2023
[SPARK-44823] Updated
black
to 23.9.1 and fixed erroneous check.[SPARK-45339] PySpark now logs errors it retries.
Revert [SPARK-42946] Redacted sensitive data nested under variable substitutions.
[SPARK-44551] Edited comments to sync with OSS.
[SPARK-45360] Spark session builder supports initialization from
SPARK_REMOTE
.[SPARK-45279] Attached
plan_id
to all logical plans.[SPARK-45425] Mapped
TINYINT
toShortType
forMsSqlServerDialect
.[SPARK-45419] Removed file version map entry of larger versions to avoid reusing
rocksdb
sst file IDs.[SPARK-45488] Added support for value in
rowTag
element.[SPARK-42205] Removed logging of
Accumulables
inTask/Stage
start events inJsonProtocol
event logs.[SPARK-45426] Added support for
ReloadingX509TrustManager
.[SPARK-45256]
DurationWriter
fails when writing more values than the initial capacity.[SPARK-43380] Fixed
Avro
data type conversion issues without causing performance regression.[SPARK-45182] Added support for rolling back shuffle map stage so all stage tasks can be retried when the stage output is indeterminate.
[SPARK-45399] Added XML Options using
newOption
.Operating system security updates.
Databricks Runtime 13.3 LTS
See Databricks Runtime 13.3 LTS.
October 22, 2024
[SPARK-48843] Prevent infinite loop with BindParameters
[BACKPORT][[SPARK-49326]]https://issues.apache.org/jira/browse/SPARK-49326)[SS] Classify Error class for Foreach sink user function error
[SPARK-49905] Use dedicated ShuffleOrigin for stateful operator to prevent the shuffle to be modified from AQE
Operating system security updates.
October 10, 2024
[SPARK-49743][SQL] OptimizeCsvJsonExpr should not change schema fields when pruning GetArrayStructFields
September 25, 2024
[SPARK-46601] [CORE] Fix log error in handleStatusMessage
[SPARK-48719][SQL] Fix the calculation bug of RegrSlope & RegrIntercept when the first parameter is null
[SPARK-43242][CORE] Fix throw ‘Unexpected type of BlockId’ in shuffle corruption diagnose
[SPARK-49000][SQL] Fix “select count(distinct 1) from t” where t is empty table by expanding RewriteDistinctAggregates
Operating system security updates.
September 17, 2024
[SPARK-49526][CONNECT] Support Windows-style paths in ArtifactManager
[SPARK-48463][ML] Make Binarizer, Bucketizer, Vector Assembler, FeatureHasher, QuantizeDiscretizer, OnehotEncoder, StopWordsRemover, Imputer, Interactor supporting nested input columns
Operating system security updates.
August 29, 2024
August 14, 2024
[SPARK-49056][SQL] ErrorClassesJsonReader cannot handle null properly
[SPARK-49065][SQL] Rebasing in legacy formatters/parsers must support non JVM default time zones
[SPARK-48597][SQL] Introduce a marker for isStreaming property in text representation of logical plan
August 1, 2024
This release includes a bug fix for the
ColumnVector
andColumnarArray
classes in the Spark Java interface. Previous to this fix, anArrayIndexOutOfBoundsException
might be thrown or incorrect data returned when an instance of one of these classes containednull
values.[SPARK-47202][PYTHON] Fix typo breaking datetimes with tzinfo
[SPARK-48896] [SPARK-48909] [SPARK-48883] Backport spark ML writer fixes
[SPARK-48463] Make StringIndexer supporting nested input columns
Operating system security updates.
July 11, 2024
(Behavior change) DataFrames cached against Delta table sources are now invalidated if the source table is overwritten. This change means that all state changes to Delta tables now invalidate cached results. Use
.checkpoint()
to persist a table state throughout the lifetime of a DataFrame.This release includes a fix to an issue that prevented the Spark UI Environment tab from displaying correctly when running in Databricks Container Services.
[SPARK-48383][SS] Throw better error for mismatched partitions in startOffset option in Kafka
[SPARK-48292][CORE] Revert [SPARK-39195][SQL] Spark OutputCommitCoordinator should abort stage when committed file not consistent with task status
[SPARK-48503][SQL] Fix invalid scalar subqueries with group-by on non-equivalent columns that were incorrectly allowed
[SPARK-48481][SQL][SS] Do not apply OptimizeOneRowPlan against streaming Dataset
[SPARK-48475][PYTHON] Optimize getjvm_function in PySpark.
[SPARK-48273][SQL] Fix late rewrite of PlanWithUnresolvedIdentifier
[SPARK-48445][SQL] Don’t inline UDFs with expensive children
Operating system security updates.
June 17, 2024
[SPARK-48277] Improve error message for ErrorClassesJsonReader.getErrorMessage
Operating system security updates.
May 21, 2024
(Behavior change)
dbutils.widgets.getAll()
is now supported to get all widget values in a notebook.[SPARK-48105][SS] Fix the race condition between state store unloading and snapshotting
[SPARK-47994][SQL] Fix bug with CASE WHEN column filter push down in SQLServer
Operating system security updates.
May 9, 2024
[SPARK-47956][SQL] Sanity check for unresolved LCA reference
[SPARK-46822][SQL] Respect spark.sql.legacy.charVarcharAsString when casting jdbc type to catalyst type in jdbc
[SPARK-47895][SQL] group by all should be idempotent
[SPARK-48018][SS] Fix null groupId causing missing param error when throwing KafkaException.couldNotReadOffsetRange
[SPARK-47973][CORE] Log call site in SparkContext.stop() and later in SparkContext.assertNotStopped()
Operating system security updates.
April 25, 2024
[SPARK-44653][SQL] Non-trivial DataFrame unions should not break caching
Miscellaneous bug fixes.
April 11, 2024
[SPARK-47509][SQL] Block subquery expressions in lambda and higher-order functions
Operating system security updates.
April 1, 2024
[SPARK-47385] Fix tuple encoders with Option inputs.
[SPARK-38708][SQL] Upgrade Hive Metastore Client to the 3.1.3 for Hive 3.1
[SPARK-47200][SS] Error class for Foreach batch sink user function error
[SPARK-47368][SQL] Remove inferTimestampNTZ config check in ParquetRowConverter
[SPARK-44252][SS] Define a new error class and apply for the case where loading state from DFS fails
[SPARK-47135][SS] Implement error classes for Kafka data loss exceptions
[SPARK-47300][SQL]
quoteIfNeeded
should quote identifier starts with digits[SPARK-47305][SQL] Fix PruneFilters to tag the isStreaming flag of LocalRelation correctly when the plan has both batch and streaming
[SPARK-47070] Fix invalid aggregation after subquery rewrite
Operating system security updates.
March 14, 2024
[SPARK-47145][SQL] Pass table identifier to row data source scan exec for V2 strategy.
[SPARK-47167][SQL] Add concrete class for JDBC anonymous relation
[SPARK-47176][SQL] Have a ResolveAllExpressionsUpWithPruning helper function
[SPARK-47044][SQL] Add executed query for JDBC external datasources to explain output
[SPARK-47125][SQL] Return null if Univocity never triggers parsing
Operating system security updates.
February 29, 2024
Fixed an issue where using a local collection as source in a MERGE command could result in the operation metric numSourceRows reporting double the correct number of rows.
Creating a schema with a defined location now requires the user to have SELECT and MODIFY privileges on ANY FILE.
Operating system security updates.
February 8, 2024
Change data feed (CDF) queries on Unity Catalog materialized views are not supported, and attempting to run a CDF query with a Unity Catalog materialized view returns an error. Unity Catalog streaming tables support CDF queries on non-
APPLY CHANGES
tables in Databricks Runtime 14.1 and later. CDF queries are not supported with Unity Catalog streaming tables in Databricks Runtime 14.0 and earlier.[SPARK-46794] Remove subqueries from LogicalRDD constraints.
[SPARK-46933] Add query execution time metric to connectors which use JDBCRDD.
[SPARK-45582] Ensure that store instance is not used after calling commit within output mode streaming aggregation.
[SPARK-46396] Timestamp inference should not throw exception.
[SPARK-46861] Avoid Deadlock in DAGScheduler.
[SPARK-46941] Can’t insert window group limit node for top-k computation if contains SizeBasedWindowFunction.
Operating system security updates.
January 31, 2024
[SPARK-46610] Create table should throw exception when no value for a key in options.
[SPARK-46383] Reduce Driver Heap Usage by Reducing the Lifespan of TaskInfo.accumulables().
[SPARK-46600] Move shared code between SqlConf and SqlApiConf to SqlApiConfHelper.
[SPARK-46676] dropDuplicatesWithinWatermark should not fail on canonicalization of the plan.
[SPARK-46763] Fix assertion failure in ReplaceDeduplicateWithAggregate for duplicate attributes.
Operating system security updates.
January 17, 2024
The
shuffle
node of the explain plan returned by a Photon query is updated to add thecausedBroadcastJoinBuildOOM=true
flag when an out-of-memory error occurs during a shuffle that is part of a broadcast join.To avoid increased latency when communicating over TLSv1.3, this maintenance release includes a patch to the JDK 8 installation to fix JDK bug JDK-8293562.
[SPARK-46058] Add separate flag for privateKeyPassword.
[SPARK-46173] Skipping trimAll call during date parsing.
[SPARK-46370] Fix bug when querying from table after changing column defaults.
[SPARK-46370] Fix bug when querying from table after changing column defaults.
[SPARK-46370] Fix bug when querying from table after changing column defaults.
[SPARK-46609] Avoid exponential explosion in PartitioningPreservingUnaryExecNode.
[SPARK-46132] Support key password for JKS keys for RPC SSL.
[SPARK-46602] Propagate
allowExisting
in view creation when the view/table does not exists.[SPARK-46249] Require instance lock for acquiring RocksDB metrics to prevent race with background operations.
[SPARK-46417] Do not fail when calling hive.getTable and throwException is false.
[SPARK-46538] Fix the ambiguous column reference issue in
ALSModel.transform
.[SPARK-46478] Revert SPARK-43049 to use oracle varchar(255) for string.
[SPARK-46250] Deflake test_parity_listener.
[SPARK-46394] Fix spark.catalog.listDatabases() issues on schemas with special characters when
spark.sql.legacy.keepCommandOutputSchema
set to true.[SPARK-46056] Fix Parquet vectorized read NPE with byteArrayDecimalType default value.
[SPARK-46145] spark.catalog.listTables does not throw exception when the table or view is not found.
[SPARK-46466] Vectorized parquet reader should never do rebase for timestamp ntz.
December 14, 2023
Fixed an issue where escaped underscores in getColumns operations originating from JDBC or ODBC clients were handled incorrectly and interpreted as wildcards.
[SPARK-45920] group by ordinal should be idempotent.
[SPARK-44582] Skip iterator on SMJ if it was cleaned up.
[SPARK-45433] Fix CSV/JSON schema inference when timestamps do not match specified timestampFormat.
[SPARK-45655] Allow non-deterministic expressions inside AggregateFunctions in CollectMetrics.
Operating system security updates.
November 29, 2023
Installed a new package,
pyarrow-hotfix
to remediate a PyArrow RCE vulnerability.Spark-snowflake connector is upgraded to 2.12.0.
[SPARK-44846] Removed complex grouping expressions after
RemoveRedundantAggregates
.[SPARK-45544] Integrated SSL support into
TransportContext
.[SPARK-45892] Refactor optimizer plan validation to decouple
validateSchemaOutput
andvalidateExprIdUniqueness
.[SPARK-45730] Improved time constraints for
ReloadingX509TrustManagerSuite
.[SPARK-45859] Made UDF objects in
ml.functions
lazy.Operating system security updates.
November 10, 2023
Partition filters on Delta Lake streaming queries are pushed down before rate limiting to achieve better utilization.
Changed data feed queries on Unity Catalog Streaming Tables and Materialized Views to display error messages.
[SPARK-45545]
SparkTransportConf
inheritsSSLOptions
upon creation.[SPARK-45584] Fixed subquery run failure with
TakeOrderedAndProjectExec
.[SPARK-45427] Added RPC SSL settings to
SSLOptions
andSparkTransportConf
.[SPARK-45541] Added
SSLFactory
.[SPARK-45430]
FramelessOffsetWindowFunction
no longer fails whenIGNORE NULLS
andoffset > rowCount
.[SPARK-45429] Added helper classes for SSL RPC communication.
[SPARK-44219] Added extra per-rule validations for optimization rewrites.
[SPARK-45543] Fixed an issue where
InferWindowGroupLimit
caused an issue if the other window functions didn’t have the same window frame as the rank-like functions.Operating system security updates.
October 23, 2023
[SPARK-45256] Fixed an issue where
DurationWriter
failed when writing more values than initial capacity.[SPARK-45419] Avoid reusing
rocksdb sst
files in a differentrocksdb
instance by removing file version map entries of larger versions.[SPARK-45426] Added support for
ReloadingX509TrustManager
.Miscellaneous fixes.
October 13, 2023
Snowflake-jdbc dependency upgraded from 3.13.29 to 3.13.33.
The
array_insert
function is 1-based for positive and negative indexes, while before, it was 0-based for negative indexes. It now inserts a new element at the end of input arrays for the index -1. To restore the previous behavior, setspark.sql.legacy.negativeIndexInArrayInsert
totrue
.Fixed an issue around not ignoring corrupt files when
ignoreCorruptFiles
is enabled during CSV schema inference with Auto Loader.Revert [SPARK-42946].
[SPARK-42205] Updated the JSON protocol to remove Accumulables logging in a task or stage start events.
[SPARK-45178] Fallback to running a single batch for
Trigger.AvailableNow
with unsupported sources rather than using the wrapper.[SPARK-45316] Add new parameters
ignoreCorruptFiles
andignoreMissingFiles
toHadoopRDD
andNewHadoopRDD
.[SPARK-44740] Fixed metadata values for Artifacts.
[SPARK-45360] Initialized Spark session builder configuration from
SPARK_REMOTE
.[SPARK-44551] Edited comments to sync with OSS.
[SPARK-45346] Parquet schema inference now respects case-sensitive flags when merging schema.
[SPARK-44658]
ShuffleStatus.getMapStatus
now returnsNone
instead ofSome(null)
.[SPARK-44840] Made
array_insert()
1-based for negative indexes.
September 14, 2023
[SPARK-44873] Added support for
alter view
with nested columns in Hive client.[SPARK-44878] Turned off strict limit for
RocksDB
write manager to avoid insertion exception on cache complete.
August 30, 2023
The dbutils
cp
command (dbutils.fs.cp
) has been optimized for faster copying. With this improvement, copy operations can take up to 100 less time, depending on the file size. The feature is available across all Clouds and file systems accessible in Databricks, including for Unity Catalog Volumes and DBFS mounts.[SPARK-44455] Quote identifiers with backticks in the
SHOW CREATE TABLE
result.[SPARK-44763] Fixed an issue that showed a string as a double in binary arithmetic with interval.
[SPARK-44871] Fixed
percentile_disc
behavior.[SPARK-44714] Ease restriction of LCA resolution regarding queries.
[SPARK-44818] Fixed race for pending task interrupt issued before
taskThread
is initialized.[SPARK-44505] Added override for columnar support in Scan for DSv2.
[SPARK-44479] Fixed protobuf conversion from an empty struct type.
[SPARK-44718] Match
ColumnVector
memory-mode config default toOffHeapMemoryMode
config value.[SPARK-42941] Added support for
StreamingQueryListener
in Python.[SPARK-44558] Export PySpark’s Spark Connect Log Level.
[SPARK-44464] Fixed
applyInPandasWithStatePythonRunner
to output rows that have Null as the first column value.[SPARK-44643] Fixed
Row.__repr__
when the field is an empty row.Operating system security updates.
Databricks Runtime 12.2 LTS
See Databricks Runtime 12.2 LTS.
October 10, 2024
[SPARK-49743][SQL] OptimizeCsvJsonExpr should not change schema fields when pruning GetArrayStructFields
September 25, 2024
[SPARK-49000][SQL] Fix “select count(distinct 1) from t” where t is empty table by expanding RewriteDistinctAggregates
[SPARK-46601] [CORE] Fix log error in handleStatusMessage
Miscellaneous bug fixes.
September 17, 2024
Operating system security updates.
August 29, 2024
Miscellaneous bug fixes.
August 14, 2024
[SPARK-48941][SPARK-48970] Backport ML writer / reader fixes
[SPARK-49065][SQL] Rebasing in legacy formatters/parsers must support non JVM default time zones
[SPARK-49056][SQL] ErrorClassesJsonReader cannot handle null properly
[SPARK-48597][SQL] Introduce a marker for isStreaming property in text representation of logical plan
[SPARK-48463][ML] Make StringIndexer supporting nested input columns
Operating system security updates.
August 1, 2024
[SPARK-48896] [SPARK-48909] [SPARK-48883] Backport spark ML writer fixes
August 1, 2024
To apply required security patches, the Python version in Databricks Runtime 12.2 LTS is upgraded from 3.9.5 to 3.9.19.
July 11, 2024
(Behavior change) DataFrames cached against Delta table sources are now invalidated if the source table is overwritten. This change means that all state changes to Delta tables now invalidate cached results. Use
.checkpoint()
to persist a table state throughout the lifetime of a DataFrame.[SPARK-48481][SQL][SS] Do not apply OptimizeOneRowPlan against streaming Dataset
[SPARK-47070] Fix invalid aggregation after subquery rewrite
[SPARK-42741][SQL] Do not unwrap casts in binary comparison when literal is null
[SPARK-48445][SQL] Don’t inline UDFs with expensive children
[SPARK-48503][SQL] Fix invalid scalar subqueries with group-by on non-equivalent columns that were incorrectly allowed
[SPARK-48383][SS] Throw better error for mismatched partitions in startOffset option in Kafka
Operating system security updates.
June 17, 2024
[SPARK-48277] Improve error message for ErrorClassesJsonReader.getErrorMessage
Miscellaneous bug fixes.
May 21, 2024
[SPARK-48105][SS] Fix the race condition between state store unloading and snapshotting
Operating system security updates.
May 9, 2024
[SPARK-44251][SQL] Set nullable correctly on coalesced join key in full outer USING join
[SPARK-47973][CORE] Log call site in SparkContext.stop() and later in SparkContext.assertNotStopped()
[SPARK-47956][SQL] Sanity check for unresolved LCA reference
[SPARK-48018][SS] Fix null groupId causing missing param error when throwing KafkaException.couldNotReadOffsetRange
Operating system security updates.
April 25, 2024
Operating system security updates.
April 11, 2024
Operating system security updates.
April 1, 2024
[SPARK-47305][SQL] Fix PruneFilters to tag the isStreaming flag of LocalRelation correctly when the plan has both batch and streaming
[SPARK-44252][SS] Define a new error class and apply for the case where loading state from DFS fails
[SPARK-47135][SS] Implement error classes for Kafka data loss exceptions
[SPARK-47200][SS] Error class for Foreach batch sink user function error
Operating system security updates.
March 14, 2024
[SPARK-47176][SQL] Have a ResolveAllExpressionsUpWithPruning helper function
Revert “[SPARK-46861][CORE] Avoid Deadlock in DAGScheduler”
[SPARK-47125][SQL] Return null if Univocity never triggers parsing
[SPARK-47167][SQL] Add concrete class for JDBC anonymous relation
Operating system security updates.
February 29, 2024
Fixed an issue where using a local collection as source in a MERGE command could result in the operation metric numSourceRows reporting double the correct number of rows.
Creating a schema with a defined location now requires the user to have SELECT and MODIFY privileges on ANY FILE.
[SPARK-45582][SS] Ensure that store instance is not used after calling commit within output mode streaming aggregation
Operating system security updates.
February 13, 2024
[SPARK-46861] Avoid Deadlock in DAGScheduler.
[SPARK-46794] Remove subqueries from LogicalRDD constraints.
Operating system security updates.
January 31, 2024
[SPARK-46763] Fix assertion failure in ReplaceDeduplicateWithAggregate for duplicate attributes.
Operating system security updates.
December 25, 2023
To avoid increased latency when communicating over TLSv1.3, this maintenance release includes a patch to the JDK 8 installation to fix JDK bug JDK-8293562.
[SPARK-39440] Add a config to disable event timeline.
[SPARK-46132] Support key password for JKS keys for RPC SSL.
[SPARK-46394] Fix spark.catalog.listDatabases() issues on schemas with special characters when
spark.sql.legacy.keepCommandOutputSchema
set to true.[SPARK-46417] Do not fail when calling hive.getTable and throwException is false.
[SPARK-43067] Correct the location of error class resource file in Kafka connector.
[SPARK-46249] Require instance lock for acquiring RocksDB metrics to prevent race with background operations.
[SPARK-46602] Propagate
allowExisting
in view creation when the view/table does not exists.[SPARK-46058] Add separate flag for privateKeyPassword.
[SPARK-46145] spark.catalog.listTables does not throw exception when the table or view is not found.
[SPARK-46538] Fix the ambiguous column reference issue in
ALSModel.transform
.[SPARK-42852] Revert NamedLambdaVariable related changes from EquivalentExpressions.
December 14, 2023
Fixed an issue where escaped underscores in getColumns operations originating from JDBC or ODBC clients were handled incorrectly and interpreted as wildcards.
[SPARK-44582] Skip iterator on SMJ if it was cleaned up.
[SPARK-45920] group by ordinal should be idempotent.
[SPARK-45655] Allow non-deterministic expressions inside AggregateFunctions in CollectMetrics.
Operating system security updates.
November 29, 2023
Installed a new package,
pyarrow-hotfix
to remediate a PyArrow RCE vulnerability.Fixed an issue where escaped underscores in
getColumns
operations originating from JDBC or ODBC clients were wrongly interpreted as wildcards.[SPARK-42205] Removed logging accumulables in
Stage
andTask
start events.[SPARK-44846] Removed complex grouping expressions after
RemoveRedundantAggregates
.[SPARK-43718] Fixed nullability for keys in
USING
joins.[SPARK-45544] Integrated SSL support into
TransportContext
.[SPARK-43973] Structured Streaming UI now displays failed queries correctly.
[SPARK-45730] Improved time constraints for
ReloadingX509TrustManagerSuite
.[SPARK-45859] Made UDF objects in
ml.functions
lazy.Operating system security updates.
November 14, 2023
Partition filters on Delta Lake streaming queries are pushed down before rate limiting to achieve better utilization.
[SPARK-45545]
SparkTransportConf
inheritsSSLOptions
upon creation.[SPARK-45427] Added RPC SSL settings to
SSLOptions
andSparkTransportConf
.[SPARK-45584] Fixed subquery run failure with
TakeOrderedAndProjectExec
.[SPARK-45541] Added
SSLFactory
.[SPARK-45430]
FramelessOffsetWindowFunction
no longer fails whenIGNORE NULLS
andoffset > rowCount
.[SPARK-45429] Added helper classes for SSL RPC communication.
Operating system security updates.
October 24, 2023
[SPARK-45426] Added support for
ReloadingX509TrustManager
.Miscellaneous fixes.
October 13, 2023
Snowflake-jdbc dependency upgraded from 3.13.29 to 3.13.33.
[SPARK-42553] Ensure at least one time unit after interval.
[SPARK-45346] Parquet schema inference respects case sensitive flag when merging schema.
[SPARK-45178] Fallback to running a single batch for
Trigger.AvailableNow
with unsupported sources rather than using the wrapper.[SPARK-45084]
StateOperatorProgress
to use an accurate, adequate shuffle partition number.
September 12, 2023
[SPARK-44873] Added support for
alter view
with nested columns in the Hive client.[SPARK-44718] Match
ColumnVector
memory-mode config default toOffHeapMemoryMode
config value.[SPARK-43799] Added descriptor binary option to PySpark
Protobuf
API.Miscellaneous fixes.
August 30, 2023
[SPARK-44485] Optimized
TreeNode.generateTreeString
.[SPARK-44818] Fixed race for pending task interrupt issued before
taskThread
is initialized.[SPARK-44871][11.3-13.0] Fixed
percentile_disc
behavior.[SPARK-44714] Eased restriction of LCA resolution regarding queries.
Operating system security updates.
August 15, 2023
[SPARK-44504] Maintenance task cleans up loaded providers on stop error.
[SPARK-44464] Fixed
applyInPandasWithStatePythonRunner
to output rows that haveNull
as the first column value.Operating system security updates.
July 29, 2023
Fixed an issue where
dbutils.fs.ls()
returnedINVALID_PARAMETER_VALUE.LOCATION_OVERLAP
when called for a storage location path which clashed with other external or managed storage location.[SPARK-44199]
CacheManager
no longer refreshes thefileIndex
unnecessarily.Operating system security updates.
July 24, 2023
[SPARK-44337] Fixed an issue where any field set to
Any.getDefaultInstance
caused parse errors.[SPARK-44136] Fixed an issue where
StateManager
would get materialized in an executor instead of the driver inFlatMapGroupsWithStateExec
.Operating system security updates.
June 23, 2023
Operating system security updates.
June 15, 2023
Photonized
approx_count_distinct
.Snowflake-jdbc library is upgraded to 3.13.29 to address a security issue.
[SPARK-43779]
ParseToDate
now loadsEvalMode
in the main thread.[SPARK-43156][SPARK-43098] Extended scalar subquery count error test with
decorrelateInnerQuery
turned off.Operating system security updates.
June 2, 2023
The JSON parser in
failOnUnknownFields
mode drops a record inDROPMALFORMED
mode and fails directly inFAILFAST
mode.Improve the performance of incremental updates with
SHALLOW CLONE
Iceberg and Parquet.Fixed an issue in Auto Loader where different source file formats were inconsistent when the provided schema did not include inferred partitions. This issue could cause unexpected failures when reading files with missing columns in the inferred partition schema.
[SPARK-43404] Skip reusing the sst file for the same version of RocksDB state store to avoid the ID mismatch error.
[SPARK-43413][11.3-13.0] Fixed
IN
subqueryListQuery
nullability.[SPARK-43522] Fixed creating struct column name with index of array.
[SPARK-43541] Propagate all
Project
tags in resolving of expressions and missing columns.[SPARK-43527] Fixed
catalog.listCatalogs
in PySpark.[SPARK-43123] Internal field metadata no longer leaks to catalogs.
[SPARK-43340] Fixed missing stack trace field in eventlogs.
[SPARK-42444]
DataFrame.drop
now handles duplicated columns correctly.[SPARK-42937]
PlanSubqueries
now setsInSubqueryExec#shouldBroadcast
to true.[SPARK-43286] Updated
aes_encrypt
CBC mode to generate random IVs.[SPARK-43378] Properly close stream objects in
deserializeFromChunkedBuffer
.
May 17, 2023
Parquet scans are now robust against OOMs when scanning exceptionally structured files by dynamically adjusting batch size. File metadata is analyzed to preemptively lower batch size and is lowered again on task retries as a final safety net.
If an Avro file was read with just the
failOnUnknownFields
option or with Auto Loader in thefailOnNewColumns
schema evolution mode, columns that have different data types would be read asnull
instead of throwing an error stating that the file cannot be read. These reads now fail and recommend users to use therescuedDataColumn
option.Auto Loader now does the following.
Correctly reads and no longer rescues
Integer
,Short
, andByte
types if one of these data types is provided, but the Avro file suggests one of the other two types.
Prevents reading interval types as date or time stamp types to avoid getting corrupt dates.
Prevents reading
Decimal
types with lower precision.
[SPARK-43172] Exposes host and token from Spark connect client.
[SPARK-43293]
__qualified_access_only
is ignored in normal columns.[SPARK-43098] Fixed correctness
COUNT
bug when scalar subquery is grouped by clause.[SPARK-43085] Support for column
DEFAULT
assignment for multi-part table names.[SPARK-43190]
ListQuery.childOutput
is now consistent with secondary output.[SPARK-43192] Removed user agent charset validation.
Operating system security updates.
April 25, 2023
If a Parquet file was read with just the
failOnUnknownFields
option or with Auto Loader in thefailOnNewColumns
schema evolution mode, columns that had different data types would be read asnull
instead of throwing an error stating that the file cannot be read. These reads now fail and recommend users to use therescuedDataColumn
option.Auto Loader now correctly reads and no longer rescues
Integer
,Short
, andByte
types if one of these data types is provided. The Parquet file suggests one of the other two types. When the rescued data column was previously enabled, the data type mismatch would cause columns to be saved even though they were readable.[SPARK-43009] Parameterized
sql()
withAny
constants[SPARK-42406] Terminate Protobuf recursive fields by dropping the field
[SPARK-43038] Support the CBC mode by
aes_encrypt()
/aes_decrypt()
[SPARK-42971] Change to print
workdir
ifappDirs
is null when worker handleWorkDirCleanup
event[SPARK-43018] Fix bug for INSERT commands with timestamp literals
Operating system security updates.
April 11, 2023
Support legacy data source formats in the
SYNC
command.Fixes an issue in the %autoreload behavior in notebooks outside of a repo.
Fixed an issue where Auto Loader schema evolution can go into an infinite fail loop when a new column is detected in the schema of a nested JSON object.
[SPARK-42928] Makes
resolvePersistentFunction
synchronized.[SPARK-42936] Fixes LCan issue when the clause can be resolved directly by its child aggregate.
[SPARK-42967] Fixes
SparkListenerTaskStart.stageAttemptId
when a task starts after the stage is canceled.Operating system security updates.
March 29, 2023
Databricks SQL now supports specifying default values for columns of Delta Lake tables, either at table creation time or afterward. Subsequent
INSERT
,UPDATE
,DELETE
, andMERGE
commands can refer to any column’s default value using the explicitDEFAULT
keyword. In addition, if anyINSERT
assignment has an explicit list of fewer columns than the target table, corresponding column default values are substituted for the remaining columns (or NULL if no default is specified).For example:
CREATE TABLE t (first INT, second DATE DEFAULT CURRENT_DATE()); INSERT INTO t VALUES (0, DEFAULT); INSERT INTO t VALUES (1, DEFAULT); SELECT first, second FROM t; > 0, 2023-03-28 1, 2023-03-28z
Auto Loader now initiates at least one synchronous RocksDB log cleanup for
Trigger.AvailableNow
streams to check that the checkpoint can get regularly cleaned up for fast-running Auto Loader streams. This can cause some streams to take longer before they shut down, but it will save you storage costs and improve the Auto Loader experience in future runs.You can now modify a Delta table to add support to table features using
DeltaTable.addFeatureSupport(feature_name)
.[SPARK-42794] Increase the lockAcquireTimeoutMs to 2 minutes for acquiring the RocksDB state store in Structure Streaming
[SPARK-42521] Add NULLs for INSERTs with user-specified lists of fewer columns than the target table
[SPARK-42702][SPARK-42623] Support parameterized query in subquery and CTE
[SPARK-42668] Catch exception while trying to close the compressed stream in HDFSStateStoreProvider stop
[SPARK-42403] JsonProtocol should handle null JSON strings
March 8, 2023
The error message “Failure to initialize configuration” has been improved to provide more context for the customer.
There is a terminology change for adding features to a Delta table using the table property. The preferred syntax is now
'delta.feature.featureName'='supported'
instead of'delta.feature.featureName'='enabled'
. For backward compatibility, using'delta.feature.featureName'='enabled'
still works and will continue to work.Starting from this release, it is possible to create/replace a table with an additional table property
delta.ignoreProtocolDefaults
to ignore protocol-related Spark configs, which includes default reader and writer versions and table features supported by default.[SPARK-42070] Change the default value of the argument of the Mask function from -1 to NULL
[SPARK-41793] Incorrect result for window frames defined by a range clause on significant decimals
[SPARK-42484] UnsafeRowUtils better error message
[SPARK-42516] Always capture the session time zone config while creating views
[SPARK-42635] Fix the TimestampAdd expression.
[SPARK-42622] Turned off substitution in values
[SPARK-42534] Fix DB2Dialect Limit clause
[SPARK-42121] Add built-in table-valued functions posexplode, posexplode_outer, json_tuple and stack
[SPARK-42045] ANSI SQL mode: Round/Bround should return an error on tiny/small/significant integer overflow
Operating system security updates.
Databricks Runtime 11.3 LTS
See Databricks Runtime 11.3 LTS.
October 10, 2024
Miscellaneous bug fixes.
September 25, 2024
[SPARK-46601] [CORE] Fix log error in handleStatusMessage
[SPARK-49000][SQL] Fix “select count(distinct 1) from t” where t is empty table by expanding RewriteDistinctAggregates
Miscellaneous bug fixes.
September 17, 2024
Operating system security updates.
August 29, 2024
August 14, 2024
[SPARK-48941][SPARK-48970] Backport ML writer / reader fixes
[SPARK-49065][SQL] Rebasing in legacy formatters/parsers must support non JVM default time zones
[SPARK-48597][SQL] Introduce a marker for isStreaming property in text representation of logical plan
[SPARK-48463][ML] Make StringIndexer supporting nested input columns
Operating system security updates.
August 1, 2024
[SPARK-48896] [SPARK-48909] [SPARK-48883] Backport spark ML writer fixes
August 1, 2024
To apply required security patches, the Python version in Databricks Runtime 11.3 LTS is upgraded from 3.9.5 to 3.9.19.
July 11, 2024
[SPARK-48383][SS] Throw better error for mismatched partitions in startOffset option in Kafka
[SPARK-47070] Fix invalid aggregation after subquery rewrite
Operating system security updates.
June 17, 2024
Operating system security updates.
May 21, 2024
[SPARK-48105][SS] Fix the race condition between state store unloading and snapshotting
Operating system security updates.
May 9, 2024
[SPARK-48018][SS] Fix null groupId causing missing param error when throwing KafkaException.couldNotReadOffsetRange
[SPARK-47973][CORE] Log call site in SparkContext.stop() and later in SparkContext.assertNotStopped()
[SPARK-44251][SQL] Set nullable correctly on coalesced join key in full outer USING join
Operating system security updates.
April 25, 2024
Operating system security updates.
April 11, 2024
Operating system security updates.
April 1, 2024
[SPARK-44252][SS] Define a new error class and apply for the case where loading state from DFS fails
[SPARK-47135][SS] Implement error classes for Kafka data loss exceptions
Revert “[SPARK-46861][CORE] Avoid Deadlock in DAGScheduler”
[SPARK-47200][SS] Error class for Foreach batch sink user function error
Operating system security updates.
March 14, 2024
[SPARK-47167][SQL] Add concrete class for JDBC anonymous relation
[SPARK-47125][SQL] Return null if Univocity never triggers parsing
Operating system security updates.
February 29, 2024
Fixed an issue where using a local collection as source in a MERGE command could result in the operation metric numSourceRows reporting double the correct number of rows.
[SPARK-45582][SS] Ensure that store instance is not used after calling commit within output mode streaming aggregation
February 13, 2024
[SPARK-46794] Remove subqueries from LogicalRDD constraints.
[SPARK-46861] Avoid Deadlock in DAGScheduler.
Operating system security updates.
January 31, 2024
Operating system security updates.
December 25, 2023
To avoid increased latency when communicating over TLSv1.3, this maintenance release includes a patch to the JDK 8 installation to fix JDK bug JDK-8293562.
[SPARK-46058] Add separate flag for privateKeyPassword.
[SPARK-46602] Propagate
allowExisting
in view creation when the view/table does not exists.[SPARK-46394] Fix spark.catalog.listDatabases() issues on schemas with special characters when
spark.sql.legacy.keepCommandOutputSchema
set to true.[SPARK-46538] Fix the ambiguous column reference issue in
ALSModel.transform
.[SPARK-39440] Add a config to disable event timeline.
[SPARK-46249] Require instance lock for acquiring RocksDB metrics to prevent race with background operations.
[SPARK-46132] Support key password for JKS keys for RPC SSL.
December 14, 2023
Fixed an issue where escaped underscores in getColumns operations originating from JDBC or ODBC clients were handled incorrectly and interpreted as wildcards.
Operating system security updates.
November 29, 2023
Installed a new package,
pyarrow-hotfix
to remediate a PyArrow RCE vulnerability.Fixed an issue where escaped underscores in
getColumns
operations originating from JDBC or ODBC clients were wrongly interpreted as wildcards.[SPARK-43973] Structured Streaming UI now displays failed queries correctly.
[SPARK-45730] Improved time constraints for
ReloadingX509TrustManagerSuite
.[SPARK-45544] Integrated SSL support into
TransportContext
.[SPARK-45859] Made UDF objects in
ml.functions
lazy.[SPARK-43718] Fixed nullability for keys in
USING
joins.[SPARK-44846] Removed complex grouping expressions after
RemoveRedundantAggregates
.Operating system security updates.
November 14, 2023
Partition filters on Delta Lake streaming queries are pushed down before rate limiting to achieve better utilization.
[SPARK-42205] Removed logging accumulables in Stage and Task start events.
[SPARK-45545]
SparkTransportConf
inheritsSSLOptions
upon creation.Revert [SPARK-33861].
[SPARK-45541] Added
SSLFactory
.[SPARK-45429] Added helper classes for SSL RPC communication.
[SPARK-45584] Fixed subquery run failure with
TakeOrderedAndProjectExec
.[SPARK-45430]
FramelessOffsetWindowFunction
no longer fails whenIGNORE NULLS
andoffset > rowCount
.[SPARK-45427] Added RPC SSL settings to
SSLOptions
andSparkTransportConf
.Operating system security updates.
October 24, 2023
[SPARK-45426] Added support for
ReloadingX509TrustManager
.Miscellaneous fixes.
October 13, 2023
Snowflake-jdbc dependency upgraded from 3.13.29 to 3.13.33.
[SPARK-45178] Fallback to running a single batch for
Trigger.AvailableNow
with unsupported sources rather than using the wrapper.[SPARK-45084]
StateOperatorProgress
to use an accurate, adequate shuffle partition number.[SPARK-45346] Parquet schema inference now respects case-sensitive flag when merging a schema.
Operating system security updates.
September 10, 2023
Miscellaneous fixes.
August 30, 2023
[SPARK-44818] Fixed race for pending task interrupt issued before
taskThread
is initialized.[SPARK-44871][11.3-13.0] Fixed
percentile_disc
behavior.Operating system security updates.
August 15, 2023
[SPARK-44485] Optimized
TreeNode.generateTreeString
.[SPARK-44504] Maintenance task cleans up loaded providers on stop error.
[SPARK-44464] Fixed
applyInPandasWithStatePythonRunner
to output rows that haveNull
as the first column value.Operating system security updates.
July 27, 2023
Fixed an issue where
dbutils.fs.ls()
returnedINVALID_PARAMETER_VALUE.LOCATION_OVERLAP
when called for a storage location path which clashed with other external or managed storage location.[SPARK-44199]
CacheManager
no longer refreshes thefileIndex
unnecessarily.Operating system security updates.
July 24, 2023
[SPARK-44136] Fixed an issue that StateManager can get materialized in executor instead of driver in FlatMapGroupsWithStateExec.
Operating system security updates.
June 23, 2023
Operating system security updates.
June 15, 2023
Photonized
approx_count_distinct
.Snowflake-jdbc library is upgraded to 3.13.29 to address a security issue.
[SPARK-43779]
ParseToDate
now loadsEvalMode
in the main thread.[SPARK-40862] Support non-aggregated subqueries in RewriteCorrelatedScalarSubquery
[SPARK-43156][SPARK-43098] Extended scalar subquery count bug test with
decorrelateInnerQuery
turned off.[SPARK-43098] Fix correctness COUNT bug when scalar subquery has a group by clause
Operating system security updates.
June 2, 2023
The JSON parser in
failOnUnknownFields
mode drops a record inDROPMALFORMED
mode and fails directly inFAILFAST
mode.Improve the performance of incremental updates with
SHALLOW CLONE
Iceberg and Parquet.Fixed an issue in Auto Loader where different source file formats were inconsistent when the provided schema did not include inferred partitions. This issue could cause unexpected failures when reading files with missing columns in the inferred partition schema.
[SPARK-43404]Skip reusing the sst file for the same version of RocksDB state store to avoid the ID mismatch error.
[SPARK-43527] Fixed
catalog.listCatalogs
in PySpark.[SPARK-43413][11.3-13.0] Fixed
IN
subqueryListQuery
nullability.[SPARK-43340] Fixed missing stack trace field in eventlogs.
Databricks Runtime 10.4 LTS
See Databricks Runtime 10.4 LTS.
October 22, 2024
Operating system security updates.
October 10, 2024
Operating system security updates.
September 25, 2024
[SPARK-46601] [CORE] Fix log error in handleStatusMessage
[SPARK-49000][SQL] Fix “select count(distinct 1) from t” where t is empty table by expanding RewriteDistinctAggregates
Operating system security updates.
September 17, 2024
Operating system security updates.
August 29, 2024
[SPARK-49065][SQL] Rebasing in legacy formatters/parsers must support non JVM default time zones
August 14, 2024
[SPARK-48597][SQL] Introduce a marker for isStreaming property in text representation of logical plan
[SPARK-48941][SPARK-48970] Backport ML writer / reader fixes
[SPARK-48463][ML] Make StringIndexer supporting nested input columns
August 1, 2024
[SPARK-48896] [SPARK-48909] [SPARK-48883] Backport spark ML writer fixes
Operating system security updates.
July 11, 2024
[SPARK-48383][SS] Throw better error for mismatched partitions in startOffset option in Kafka
Operating system security updates.
June 17, 2024
Operating system security updates.
May 21, 2024
[SPARK-48105][SS] Fix the race condition between state store unloading and snapshotting
Operating system security updates.
May 9, 2024
[SPARK-48018][SS] Fix null groupId causing missing param error when throwing KafkaException.couldNotReadOffsetRange
[SPARK-47973][CORE] Log call site in SparkContext.stop() and later in SparkContext.assertNotStopped()
[SPARK-44251][SQL] Set nullable correctly on coalesced join key in full outer USING join
Operating system security updates.
April 25, 2024
Operating system security updates.
April 11, 2024
Operating system security updates.
April 1, 2024
[SPARK-47135][SS] Implement error classes for Kafka data loss exceptions
[SPARK-44252][SS] Define a new error class and apply for the case where loading state from DFS fails
[SPARK-47200][SS] Error class for Foreach batch sink user function error
Revert “[SPARK-46861][CORE] Avoid Deadlock in DAGScheduler”
Operating system security updates.
March 14, 2024
[SPARK-47125][SQL] Return null if Univocity never triggers parsing
Operating system security updates.
February 29, 2024
Fixed an issue where using a local collection as source in a MERGE command could result in the operation metric numSourceRows reporting double the correct number of rows.
[SPARK-45582][SS] Ensure that store instance is not used after calling commit within output mode streaming aggregation
Operating system security updates.
February 13, 2024
[SPARK-46861] Avoid Deadlock in DAGScheduler.
Operating system security updates.
January 31, 2024
Operating system security updates.
December 25, 2023
To avoid increased latency when communicating over TLSv1.3, this maintenance release includes a patch to the JDK 8 installation to fix JDK bug JDK-8293562.
[SPARK-46058] Add separate flag for privateKeyPassword.
[SPARK-46538] Fix the ambiguous column reference issue in
ALSModel.transform
.[SPARK-39440] Add a config to disable event timeline.
[SPARK-46132] Support key password for JKS keys for RPC SSL.
December 14, 2023
Operating system security updates.
November 29, 2023
Installed a new package,
pyarrow-hotfix
to remediate a PyArrow RCE vulnerability.[SPARK-45544] Integrated SSL support into
TransportContext
.[SPARK-45859] Made UDF objects in
ml.functions
lazy.[SPARK-43718] Fixed nullability for keys in
USING
joins.[SPARK-45730] Improved time constraints for
ReloadingX509TrustManagerSuite
.[SPARK-42205] Removed logging accumulables in Stage and Task start events.
[SPARK-44846] Removed complex grouping expressions after
RemoveRedundantAggregates
.Operating system security updates.
November 14, 2023
[SPARK-45541] Added
SSLFactory
.[SPARK-45545]
SparkTransportConf
inheritsSSLOptions
upon creation.[SPARK-45427] Added RPC SSL settings to
SSLOptions
andSparkTransportConf
.[SPARK-45429] Added helper classes for SSL RPC communication.
[SPARK-45584] Fixed subquery run failure with
TakeOrderedAndProjectExec
.Revert [SPARK-33861].
Operating system security updates.
October 24, 2023
[SPARK-45426] Added support for
ReloadingX509TrustManager
.Operating system security updates.
October 13, 2023
[SPARK-45084]
StateOperatorProgress
to use an accurate, adequate shuffle partition number.[SPARK-45178] Fallback to running a single batch for
Trigger.AvailableNow
with unsupported sources rather than using the wrapper.Operating system security updates.
September 10, 2023
Miscellaneous fixes.
August 30, 2023
[SPARK-44818] Fixed race for pending task interrupt issued before
taskThread
is initialized.Operating system security updates.
August 15, 2023
[SPARK-44504] Maintenance task cleans up loaded providers on stop error.
[SPARK-43973] Structured Streaming UI now appears failed queries correctly.
Operating system security updates.
June 23, 2023
Operating system security updates.
June 15, 2023
Snowflake-jdbc library is upgraded to 3.13.29 to address a security issue.
[SPARK-43098] Fix correctness COUNT bug when scalar subquery has a group by clause
[SPARK-40862] Support non-aggregated subqueries in RewriteCorrelatedScalarSubquery
[SPARK-43156][SPARK-43098] Extended scalar subquery count test with
decorrelateInnerQuery
turned off.Operating system security updates.
June 2, 2023
The JSON parser in
failOnUnknownFields
mode drops a record inDROPMALFORMED
mode and fails directly inFAILFAST
mode.Fixed an issue in JSON rescued data parsing to prevent
UnknownFieldException
.Fixed an issue in Auto Loader where different source file formats were inconsistent when the provided schema did not include inferred partitions. This issue could cause unexpected failures when reading files with missing columns in the inferred partition schema.
[SPARK-43404] Skip reusing the sst file for the same version of RocksDB state store to avoid the ID mismatch error.
[SPARK-43413] Fixed
IN
subqueryListQuery
nullability.Operating system security updates.
May 17, 2023
Parquet scans are now robust against OOMs when scanning exceptionally structured files by dynamically adjusting batch size. File metadata is analyzed to preemptively lower batch size and is lowered again on task retries as a final safety net.
[SPARK-41520] Split
AND_OR
tree pattern to separateAND
andOR
.[SPARK-43190]
ListQuery.childOutput
is now consistent with secondary output.Operating system security updates.
April 25, 2023
[SPARK-42928] Make
resolvePersistentFunction
synchronized.Operating system security updates.
April 11, 2023
Fixed an issue where Auto Loader schema evolution can go into an infinite fail loop when a new column is detected in the schema of a nested JSON object.
[SPARK-42937]
PlanSubqueries
now setsInSubqueryExec#shouldBroadcast
to true.[SPARK-42967] Fix SparkListenerTaskStart.stageAttemptId when a task is started after the stage is canceled.
March 29, 2023
[SPARK-42668] Catch exception while trying to close the compressed stream in HDFSStateStoreProvider stop
[SPARK-42635] Fix the …
Operating system security updates.
March 14, 2023
[SPARK-41162] Fix anti- and semi-join for self-join with aggregations
[SPARK-33206] Fix shuffle index cache weight calculation for small index files
[SPARK-42484] Improved the
UnsafeRowUtils
error messageMiscellaneous fixes.
February 28, 2023
Support generated column for yyyy-MM-dd date_format. This change supports partition pruning for yyyy-MM-dd as a date_format in generated columns.
Users can now read and write specific Delta tables requiring Reader version 3 and Writer version 7, using Databricks Runtime 9.1 LTS or later. To succeed, table features listed in the tables’ protocol must be supported by the current version of Databricks Runtime.
Support generated column for yyyy-MM-dd date_format. This change supports partition pruning for yyyy-MM-dd as a date_format in generated columns.
Operating system security updates.
February 16, 2023
[SPARK-30220] Enable using Exists/In subqueries outside of the Filter node
Operating system security updates.
January 31, 2023
Table types of JDBC tables are now EXTERNAL by default.
January 18, 2023
Azure Synapse connector returns a more descriptive error message when a column name contains not valid characters such as whitespaces or semicolons. In such cases, the following message will be returned:
Azure Synapse Analytics failed to run the JDBC query produced by the connector. Check column names do not include not valid characters such as ';' or white space
.[SPARK-38277] Clear write batch after RocksDB state store’s commit
[SPARK-41199] Fix metrics issue when DSv1 streaming source and DSv2 streaming source are co-used
[SPARK-41198] Fix metrics in streaming query having CTE and DSv1 streaming source.
[SPARK-41339] Close and recreate RocksDB write batch instead of just clearing.
[SPARK-41732] Apply tree-pattern based pruning for the rule SessionWindowing.
Operating system security updates.
November 29, 2022
Users can configure leading and trailing whitespaces’ behavior when writing data using the Redshift connector. The following options have been added to control white space handling:
csvignoreleadingwhitespace
, when set totrue
, removes leading white space from values during writes whentempformat
is set toCSV
orCSV GZIP
. Whitespaces are retained when the config is set tofalse
. By default, the value istrue
.csvignoretrailingwhitespace
, when set totrue
, removes trailing white space from values during writes whentempformat
is set toCSV
orCSV GZIP
. Whitespaces are retained when the config is set tofalse
. By default, the value istrue
.
Fixed an issue with JSON parsing in Auto Loader when all columns were left as strings (
cloudFiles.inferColumnTypes
was not set or set tofalse
) and the JSON contained nested objects.Operating system security updates.
November 15, 2022
Upgraded Apache commons-text to 1.10.0.
[SPARK-40646] JSON parsing for structs, maps, and arrays has been fixed so when a part of a record does not match the schema, the rest of the record can still be parsed correctly instead of returning nulls. To opt-in for the improved behavior, set
spark.sql.json.enablePartialResults
totrue
. The flag is turned off by default to preserve the original behavior.[SPARK-40292] Fix column names in
arrays_zip
function when arrays are referenced from nested structsOperating system security updates.
November 1, 2022
Fixed an issue where if a Delta table had a user-defined column named
_change_type
, but Change data feed was turned off on that table, data in that column would incorrectly fill with NULL values when runningMERGE
.Fixed an issue with Auto Loader where a file can be duplicated in the same micro-batch when
allowOverwrites
is enabled[SPARK-40697] Add read-side char padding to cover external data files
[SPARK-40596] Populate ExecutorDecommission with messages in ExecutorDecommissionInfo
Operating system security updates.
October 18, 2022
Operating system security updates.
October 5, 2022
[SPARK-40468] Fix column pruning in CSV when
_corrupt_record
is selected.Operating system security updates.
September 22, 2022
Users can set spark.conf.set(
spark.databricks.io.listKeysWithPrefix.azure.enabled
,true
) to re-enable the built-in listing for Auto Loader on ADLS Gen2. Built-in listing was previously turned off due to performance issues but can have led to increased storage costs for customers.[SPARK-40315] Add hashCode() for Literal of ArrayBasedMapData
[SPARK-40213] Support ASCII value conversion for Latin-1 characters
[SPARK-40380] Fix constant-folding of InvokeLike to avoid non-serializable literal embedded in the plan
[SPARK-38404] Improve CTE resolution when a nested CTE references an outer CTE
[SPARK-40089] Fix sorting for some Decimal types
[SPARK-39887] RemoveRedundantAliases should keep aliases that make the output of projection nodes unique
September 6, 2022
[SPARK-40235] Use interruptible lock instead of synchronized in Executor.updateDependencies().
[SPARK-40218] GROUPING SETS should preserve the grouping columns.
[SPARK-39976] ArrayIntersect should handle null in left expression correctly.
[SPARK-40053] Add
assume
to dynamic cancel cases which require Python runtime environment.[SPARK-35542] Fix: Bucketizer created for multiple columns with parameters splitsArray, inputCols and outputCols can not be loaded after saving it.
[SPARK-40079] Add Imputer inputCols validation for empty input case.
August 24, 2022
[SPARK-39983] Do not cache unserialized broadcast relations on the driver.
[SPARK-39775] Disable validate default values when parsing Avro schemas.
[SPARK-39962] Apply projection when group attributes are empty
[SPARK-37643] when charVarcharAsString is true, for char datatype predicate query should skip rpadding rule.
Operating system security updates.
August 9, 2022
[SPARK-39847] Fix race condition in RocksDBLoader.loadLibrary() if the caller thread is interrupted
[SPARK-39731] Fix issue in CSV and JSON data sources when parsing dates in “yyyyMMdd” format with CORRECTED time parser policy
Operating system security updates.
July 27, 2022
[SPARK-39625] Add Dataset.as(StructType).
[SPARK-39689]Support 2-chars
lineSep
in CSV data source.[SPARK-39104] InMemoryRelation#isCachedColumnBuffersLoaded should be thread-safe.
[SPARK-39570] Inline table should allow expressions with alias.
[SPARK-39702] Reduce memory overhead of TransportCipher$EncryptedMessage by using a shared byteRawChannel.
[SPARK-39575] add ByteBuffer#rewind after ByteBuffer#get in AvroDeserializer.
[SPARK-39476] Disable Unwrap cast optimize when casting from Long to Float/ Double or from Integer to Float.
[SPARK-38868] Don’t propagate exceptions from filter predicate when optimizing outer joins.
Operating system security updates.
July 20, 2022
Make Delta MERGE operation results consistent when the source is non-deterministic.
[SPARK-39355] Single column uses quoted to construct UnresolvedAttribute.
[SPARK-39548] CreateView Command with a window clause query press a wrong window definition not found issue.
[SPARK-39419] Fix ArraySort to throw an exception when the comparator returns null.
Turned off Auto Loader’s use of built-in cloud APIs for directory listing on Azure.
Operating system security updates.
July 5, 2022
[SPARK-39376] Hide duplicated columns in star expansion of subquery alias from NATURAL/USING JOIN
Operating system security updates.
June 15, 2022
[SPARK-39283] Fix deadlock between TaskMemoryManager and UnsafeExternalSorter.SpillableIterator.
[SPARK-39285] Spark should not check field names when reading files.
[SPARK-34096] Improve performance for nth_value ignore nulls over offset window.
[SPARK-36718] Fix the
isExtractOnly
check in CollapseProject.
June 2, 2022
[SPARK-39093] Avoid codegen compilation error when dividing year-month intervals or day-time intervals by an integral.
[SPARK-38990] Avoid NullPointerException when evaluating date_trunc/trunc format as a bound reference.
Operating system security updates.
May 18, 2022
Fixes a potential built-in memory leak in Auto Loader.
[SPARK-38918] Nested column pruning should filter out attributes that do not belong to the current relation.
[SPARK-37593] Reduce default page size by LONG_ARRAY_OFFSET if G1GC and ON_HEAP are used.
[SPARK-39084] Fix df.rdd.isEmpty() by using TaskContext to stop iterator on task completion.
[SPARK-32268] Add ColumnPruning in injectBloomFilter.
[SPARK-38974] Filter registered functions with a given database name in list functions.
[SPARK-38931] Create root dfs directory for RocksDBFileManager with an unknown number of keys on 1st checkpoint.
Operating system security updates.
April 19, 2022
Upgraded Java AWS SDK from version 1.11.655 to 1.12.1899.
Fixed an issue with notebook-scoped libraries not working in batch streaming jobs.
[SPARK-38616] Keep track of SQL query text in Catalyst TreeNode
Operating system security updates.
April 6, 2022
The following Spark SQL functions are now available with this release:
timestampadd()
anddateadd()
: Add a time duration in a specified unit to a time stamp expression.timestampdiff()
anddatediff()
: Calculate the time difference between two-time stamp expressions in a specified unit.
Parquet-MR has been upgraded to 1.12.2
Improved support for comprehensive schemas in parquet files
[SPARK-38631] Uses Java-based implementation for un-tarring at Utils.unpack.
[SPARK-38509][SPARK-38481] Cherry-pick three
timestmapadd/diff
changes.[SPARK-38523] Fix referring to the corrupt record column from CSV.
[SPARK-38237] Allow
ClusteredDistribution
to require full clustering keys.[SPARK-38437] Lenient serialization of datetime from datasource.
[SPARK-38180] Allow safe up-cast expressions in correlated equality predicates.
[SPARK-38155] Disallow distinct aggregate in lateral subqueries with unsupported predicates.
Operating system security updates.
Databricks Runtime 9.1 LTS
See Databricks Runtime 9.1 LTS.
October 22, 2024
Operating system security updates.
October 10, 2024
Operating system security updates.
September 25, 2024
[SPARK-49000][SQL] Fix “select count(distinct 1) from t” where t is empty table by expanding RewriteDistinctAggregates
Operating system security updates.
September 6, 2024
Operating system security updates.
August 29, 2024
[SPARK-49065][SQL] Rebasing in legacy formatters/parsers must support non JVM default time zones
August 14, 2024
August 1, 2024
Operating system security updates.
July 11, 2024
Operating system security updates.
June 17, 2024
Operating system security updates.
May 21, 2024
[SPARK-48105][SS] Fix the race condition between state store unloading and snapshotting
Operating system security updates.
May 9, 2024
[SPARK-47973][CORE] Log call site in SparkContext.stop() and later in SparkContext.assertNotStopped()
[SPARK-44251][SQL] Set nullable correctly on coalesced join key in full outer USING join
Operating system security updates.
April 25, 2024
Miscellaneous bug fixes.
April 11, 2024
Operating system security updates.
April 1, 2024
Revert “[SPARK-46861][CORE] Avoid Deadlock in DAGScheduler”
Operating system security updates.
March 14, 2024
Operating system security updates.
February 29, 2024
Fixed an issue where using a local collection as source in a MERGE command could result in the operation metric numSourceRows reporting double the correct number of rows.
Operating system security updates.
February 13, 2024
[SPARK-46861] Avoid Deadlock in DAGScheduler.
Operating system security updates.
January 31, 2024
Operating system security updates.
December 25, 2023
To avoid increased latency when communicating over TLSv1.3, this maintenance release includes a patch to the JDK 8 installation to fix JDK bug JDK-8293562.
[SPARK-46058] Add separate flag for privateKeyPassword.
[SPARK-39440] Add a config to disable event timeline.
[SPARK-46132] Support key password for JKS keys for RPC SSL.
December 14, 2023
Operating system security updates.
November 29, 2023
Installed a new package,
pyarrow-hotfix
to remediate a PyArrow RCE vulnerability.[SPARK-45859] Made UDF objects in
ml.functions
lazy.[SPARK-45544] Integrated SSL support into
TransportContext
.[SPARK-45730] Improved time constraints for
ReloadingX509TrustManagerSuite
.Operating system security updates.
November 14, 2023
[SPARK-45545]
SparkTransportConf
inheritsSSLOptions
upon creation.[SPARK-45429] Added helper classes for SSL RPC communication.
[SPARK-45427] Added RPC SSL settings to
SSLOptions
andSparkTransportConf
.[SPARK-45584] Fixed subquery run failure with
TakeOrderedAndProjectExec
.[SPARK-45541] Added
SSLFactory
.[SPARK-42205] Removed logging accumulables in Stage and Task start events.
Operating system security updates.
October 24, 2023
[SPARK-45426] Added support for
ReloadingX509TrustManager
.Operating system security updates.
October 13, 2023
Operating system security updates.
September 10, 2023
Miscellaneous fixes.
August 30, 2023
Operating system security updates.
August 15, 2023
Operating system security updates.
June 23, 2023
Snowflake-jdbc library is upgraded to 3.13.29 to address a security issue.
Operating system security updates.
June 15, 2023
[SPARK-43098] Fix correctness COUNT bug when scalar subquery has a group by clause.
[SPARK-43156][SPARK-43098] Extend scalar subquery count bug test with
decorrelateInnerQuery
turned off.[SPARK-40862] Support non-aggregated subqueries in RewriteCorrelatedScalarSubquery.
Operating system security updates.
June 2, 2023
The JSON parser in
failOnUnknownFields
mode drops a record inDROPMALFORMED
mode and fails directly inFAILFAST
mode.Fixed an issue in JSON rescued data parsing to prevent
UnknownFieldException
.Fixed an issue in Auto Loader where different source file formats were inconsistent when the provided schema did not include inferred partitions. This issue could cause unexpected failures when reading files with missing columns in the inferred partition schema.
[SPARK-37520] Add the
startswith()
andendswith()
string functions[SPARK-43413] Fixed
IN
subqueryListQuery
nullability.Operating system security updates.
May 17, 2023
Operating system security updates.
April 25, 2023
Operating system security updates.
April 11, 2023
Fixed an issue where Auto Loader schema evolution can go into an infinite fail loop when a new column is detected in the schema of a nested JSON object.
[SPARK-42967] Fix SparkListenerTaskStart.stageAttemptId when a task is started after the stage is canceled.
March 29, 2023
Operating system security updates.
March 14, 2023
[SPARK-42484] Improved error message for
UnsafeRowUtils
.Miscellaneous fixes.
February 28, 2023
Users can now read and write specific Delta tables requiring Reader version 3 and Writer version 7, using Databricks Runtime 9.1 LTS or later. To succeed, table features listed in the tables’ protocol must be supported by the current version of Databricks Runtime.
Operating system security updates.
February 16, 2023
Operating system security updates.
January 31, 2023
Table types of JDBC tables are now EXTERNAL by default.
January 18, 2023
Operating system security updates.
November 29, 2022
Fixed an issue with JSON parsing in Auto Loader when all columns were left as strings (
cloudFiles.inferColumnTypes
was not set or set tofalse
) and the JSON contained nested objects.Operating system security updates.
November 15, 2022
Upgraded Apache commons-text to 1.10.0.
Operating system security updates.
Miscellaneous fixes.
November 1, 2022
Fixed an issue where if a Delta table had a user-defined column named
_change_type
, but Change data feed was turned off on that table, data in that column would incorrectly fill with NULL values when runningMERGE
.Fixed an issue with Auto Loader where a file can be duplicated in the same micro-batch when
allowOverwrites
is enabled[SPARK-40596] Populate ExecutorDecommission with messages in ExecutorDecommissionInfo
Operating system security updates.
October 18, 2022
Operating system security updates.
October 5, 2022
Miscellaneous fixes.
Operating system security updates.
September 22, 2022
Users can set spark.conf.set(“spark.databricks.io.listKeysWithPrefix.azure.enabled”, “true”) to re-enable the built-in listing for Auto Loader on ADLS Gen2. Built-in listing was previously turned off due to performance issues but can have led to increased storage costs for customers.
[SPARK-40315] Add hashCode() for Literal of ArrayBasedMapData
[SPARK-40089] Fix sorting for some Decimal types
[SPARK-39887] RemoveRedundantAliases should keep aliases that make the output of projection nodes unique
September 6, 2022
[SPARK-40235] Use interruptible lock instead of synchronized in Executor.updateDependencies()
[SPARK-35542] Fix: Bucketizer created for multiple columns with parameters splitsArray, inputCols and outputCols can not be loaded after saving it
[SPARK-40079] Add Imputer inputCols validation for empty input case
August 24, 2022
[SPARK-39666] Use UnsafeProjection.create to respect
spark.sql.codegen.factoryMode
in ExpressionEncoder[SPARK-39962] Apply projection when group attributes are empty
Operating system security updates.
August 9, 2022
Operating system security updates.
July 27, 2022
Make Delta MERGE operation results consistent when the source is non-deterministic.
[SPARK-39689] Support for 2-chars
lineSep
in CSV data source[SPARK-39575] Added
ByteBuffer#rewind
afterByteBuffer#get
inAvroDeserializer
.[SPARK-37392] Fixed the performance error for catalyst optimizer.
Operating system security updates.
July 13, 2022
[SPARK-39419]
ArraySort
throws an exception when the comparator returns null.Turned off Auto Loader’s use of built-in cloud APIs for directory listing on Azure.
Operating system security updates.
July 5, 2022
Operating system security updates.
Miscellaneous fixes.
June 15, 2022
[SPARK-39283] Fix deadlock between
TaskMemoryManager
andUnsafeExternalSorter.SpillableIterator
.
June 2, 2022
[SPARK-34554] Implement the
copy()
method inColumnarMap
.Operating system security updates.
May 18, 2022
Fixed a potential built-in memory leak in Auto Loader.
Upgrade AWS SDK version from 1.11.655 to 1.11.678.
[SPARK-38918] Nested column pruning should filter out attributes that do not belong to the current relation
[SPARK-39084] Fix
df.rdd.isEmpty()
by usingTaskContext
to stop iterator on task completionOperating system security updates.
April 19, 2022
Operating system security updates.
Miscellaneous fixes.
April 6, 2022
[SPARK-38631] Uses Java-based implementation for un-tarring at Utils.unpack.
Operating system security updates.
March 22, 2022
Changed the current working directory of notebooks on High Concurrency clusters with either table access control or credential passthrough enabled to the user’s home directory. Previously, the active directory was
/databricks/driver
.[SPARK-38437] Lenient serialization of datetime from datasource
[SPARK-38180] Allow safe up-cast expressions in correlated equality predicates
[SPARK-38155] Disallow distinct aggregate in lateral subqueries with unsupported predicates
[SPARK-27442] Removed a check field when reading or writing data in a parquet.
March 14, 2022
[SPARK-38236] Absolute file paths specified in the create/alter table are treated as relative
[SPARK-34069] Interrupt task thread if local property
SPARK_JOB_INTERRUPT_ON_CANCEL
is set to true.
February 23, 2022
[SPARK-37859] SQL tables created with JDBC with Spark 3.1 are not readable with Spark 3.2.
February 8, 2022
[SPARK-27442] Removed a check field when reading or writing data in a parquet.
Operating system security updates.
February 1, 2022
Operating system security updates.
January 26, 2022
Fixed an issue where concurrent transactions on Delta tables could commit in a non-serializable order under certain rare conditions.
Fixed an issue where the
OPTIMIZE
command could fail when the ANSI SQL dialect was enabled.
January 19, 2022
Minor fixes and security enhancements.
Operating system security updates.
November 4, 2021
Fixed an issue that could cause Structured Streaming streams to fail with an
ArrayIndexOutOfBoundsException
.Fixed a race condition that might cause a query failure with an IOException like
java.io.IOException: No FileSystem for scheme
or that might cause modifications tosparkContext.hadoopConfiguration
to not take effect in queries.The Apache Spark Connector for Delta Sharing was upgraded to 0.2.0.
October 20, 2021
Upgraded BigQuery connector from 0.18.1 to 0.22.2. This adds support for the BigNumeric type.