Applies to: Databricks SQL Databricks Runtime
Databricks uses several rules to resolve conflicts among data types:
Promotion safely expands a type to a wider type.
Implicit downcasting narrows a type. The opposite of promotion.
Implicit crosscasting transforms a type into a type of another type family.
You can also explicitly cast between many types:
cast function casts between most types, and returns errors if it cannot.
Other builtin functions cast between types using provided format directives.
Type promotion is the process of casting a type into another type of the same type family which contains all possible values of the original type.
Therefore type promotion is a safe operation. For example
TINYINT has a range from
127. All its possible values can be safely promoted to
The type precedence list defines whether values of a given data type can be implicitly promoted to another data type.
Precedence list (from narrowest to widest)
TINYINT -> SMALLINT -> INT -> BIGINT -> DECIMAL -> FLOAT (1) -> DOUBLE
SMALLINT -> INT -> BIGINT -> DECIMAL -> FLOAT (1) -> DOUBLE
INT -> BIGINT -> DECIMAL -> FLOAT (1) -> DOUBLE
BIGINT -> DECIMAL -> FLOAT (1) -> DOUBLE
DECIMAL -> FLOAT (1) -> DOUBLE
FLOAT (1) -> DOUBLE
DATE -> TIMESTAMP
(1) For least common type resolution FLOAT is skipped to avoid loss of precision.
(2) For a complex type the precedence rule applies recursively to its component elements.
Special rules apply for
STRING and untyped
NULLcan be promoted to any other type.
STRINGcan be promoted to
TIMESTAMP. If the actual string value cannot be cast to least common type Databricks raises a runtime error. When promoting to
INTERVALthe string value must match the intervals units.
The least common type from a set of types is the narrowest type reachable from the type precedence graph by all elements of the set of types.
The least common type resolution is used to:
Decide whether a function that expects a parameter of a given type can be invoked using an argument of a narrower type.
Derive the operand types for operators such as arithmetic operations or comparisons.
Derive the result type for expressions such as the case expression.
Derive the result type of UNION, INTERSECT, or EXCEPT set operators.
Special rules are applied if the least common type resolves to
FLOAT. If any of the contributing types is an exact numeric type (
DECIMAL) the least common type is pushed to
DOUBLE to avoid potential loss of digits.
Databricks employs these forms of implicit casting only on function and operator invocation, and only where it can unambiguously determine the intent.
Implicit downcasting automatically casts a wider type to a narrower type without requiring you to specify the cast explicitly. Downcasting is convenient, but it carries the risk of unexpected runtime errors if the actual value fails to be representable in the narrow type.
Downcasting applies the type precedence list in reverse order.
Implicit crosscasting casts a value from one type family to another without requiring you to specify the cast explicitly.
Databricks supports implicit crosscasting from:
Any simple type, except
STRINGto any simple type.
Given a resolved function or operator, the following rules apply, in the order they are listed, for each parameter and argument pair:
In most cases the function description explicitly states the supported types or chain, such as “any numeric type”.
For example, sin(expr) operates on
DOUBLEbut will accept any numeric.
If the expected parameter type is a
STRINGand the argument is a simple type Databricks crosscasts the argument to the string parameter type.
For example, substr(str, start, len) expects
strto be a
STRING. Instead, you can pass a numeric or datetime type.
If the argument type is a
STRINGand the expected parameter type is a simple type, Databricks crosscasts the string argument to the widest supported parameter type.
For example, date_add(date, days) expects a
If you invoke
STRINGs, Databricks crosscasts the first
DATEand the second
If the function expects a numeric type, such as an
INTEGER, or a
DATEtype, but the argument is a more general type, such as a
TIMESTAMP, Databricks implicitly downcasts the argument to that parameter type.
For example, a date_add(date, days) expects a
If you invoke
BIGINT, Databricks downcasts the
DATEby removing the time component and the
Otherwise, Databricks raises an error.
The result type is the least common type of the arguments.
-- The least common type of TINYINT and BIGINT is BIGINT > SELECT typeof(coalesce(1Y, 1L, NULL)); BIGINT -- INTEGER and DATE do not share a precedence chain or support crosscasting in either direction. > SELECT typeof(coalesce(1, DATE'2020-01-01')); Error: Incompatible types [INT, DATE] -- Both are ARRAYs and the elements have a least common type > SELECT typeof(coalesce(ARRAY(1Y), ARRAY(1L))) ARRAY<BIGINT> -- The least common type of INT and FLOAT is DOUBLE > SELECT typeof(coalesce(1, 1F)) DOUBLE > SELECT typeof(coalesce(1L, 1F)) DOUBLE > SELECT typeof(coalesce(1BD, 1F)) DOUBLE -- The least common type between an INT and STRING is BIGINT > SELECT typeof(coalesce(5, '6')); BIGINT -- The least common type is a BIGINT, but the value is not BIGINT. > SELECT coalesce('6.1', 5); Error: 6.1 is not a BIGINT -- The least common type between a DECIMAL and a STRING is a DOUBLE > SELECT typeof(coalesce(1BD, '6')); DOUBLE
The substring function expects arguments of type
STRING for the string and
INTEGER for the start and length parameters.
-- Promotion of TINYINT to INTEGER > SELECT substring('hello', 1Y, 2); he -- No casting > SELECT substring('hello', 1, 2); he -- Casting of a literal string > SELECT substring('hello', '1', 2); he -- Downcasting of a BIGINT to an INT > SELECT substring('hello', 1L, 2); he -- Crosscasting from STRING to INTEGER > SELECT substring('hello', str, 2) FROM VALUES(CAST('1' AS STRING)) AS T(str); he -- Crosscasting from INTEGER to STRING > SELECT substring(12345, 2, 2); 23
|| (CONCAT) allows implicit crosscasting to string.
-- A numeric is cast to STRING > SELECT 'This is a numeric: ' || 5.4E10; This is a numeric: 5.4E10 -- A date is cast to STRING > SELECT 'This is a date: ' || DATE'2021-11-30'; This is a date: 2021-11-30
date_add can be invoked with a
BIGINT due to implicit downcasting.
> SELECT date_add(TIMESTAMP'2011-11-30 08:30:00', 5L); 2011-12-05
date_add can be invoked with
STRINGs due to implicit crosscasting.
> SELECT date_add('2011-11-30 08:30:00', '5'); 2011-12-05