Data types
Applies to: Databricks SQL Databricks Runtime
For rules governing how conflicts between data types are resolved, see SQL data type rules.
Supported data types
Databricks supports the following data types:
Data Type |
Description |
---|---|
Represents 8-byte signed integer numbers. |
|
Represents byte sequence values. |
|
Represents Boolean values. |
|
Represents values comprising values of fields year, month and day, without a time-zone. |
|
Represents numbers with maximum precision |
|
Represents 8-byte double-precision floating point numbers. |
|
Represents 4-byte single-precision floating point numbers. |
|
Represents 4-byte signed integer numbers. |
|
Represents intervals of time either on a scale of seconds or months. |
|
Represents the untyped NULL. |
|
Represents 2-byte signed integer numbers. |
|
Represents character string values. |
|
Represents values comprising values of fields year, month, day, hour, minute, and second, with the session local timezone. |
|
Represents values comprising values of fields year, month, day, hour, minute, and second. All operations are performed without taking any time zone into account. |
|
Represents 1-byte signed integer numbers. |
|
Represents values comprising a sequence of elements with the type of |
|
Represents values comprising a set of key-value pairs. |
|
STRUCT < [fieldName : fieldType [NOT NULL][COMMENT str][, …]] > |
Represents values with the structure described by a sequence of fields. |
Represents semi-structured data. |
|
Represents values in a |
Important
Delta Lake does not support the VOID
type.
Data type classification
Data types are grouped into the following classes:
Integral numeric types represent whole numbers:
Exact numeric types represent base-10 numbers:
Binary floating point types use exponents and a binary representation to cover a large range of numbers:
Numeric types represents all numeric data types:
Date-time types represent date and time components:
Simple types are types defined by holding singleton values:
Complex types are composed of multiple components of complex or simple types:
Language mappings
Applies to: Databricks Runtime
Spark SQL data types are defined in the package org.apache.spark.sql.types
. You access them by importing the package:
import org.apache.spark.sql.types._
SQL type |
Data type |
Value type |
API to access or create data type |
---|---|---|---|
ByteType |
Byte |
ByteType |
|
ShortType |
Short |
ShortType |
|
IntegerType |
Int |
IntegerType |
|
LongType |
Long |
LongType |
|
FloatType |
Float |
FloatType |
|
DoubleType |
Double |
DoubleType |
|
DecimalType |
java.math.BigDecimal |
DecimalType |
|
StringType |
String |
StringType |
|
BinaryType |
Array[Byte] |
BinaryType |
|
BooleanType |
Boolean |
BooleanType |
|
TimestampType |
java.sql.Timestamp |
TimestampType |
|
TimestampNTZType |
java.time.LocalDateTime |
TimestampNTZType |
|
DateType |
java.sql.Date |
DateType |
|
YearMonthIntervalType |
java.time.Period |
YearMonthIntervalType (3) |
|
DayTimeIntervalType |
java.time.Duration |
DayTimeIntervalType (3) |
|
ArrayType |
scala.collection.Seq |
ArrayType(elementType [, containsNull]). (2) |
|
MapType |
scala.collection.Map |
MapType(keyType, valueType [, valueContainsNull]). (2) |
|
StructType |
org.apache.spark.sql.Row |
StructType(fields). fields is a Seq of StructField. [4](#4). |
|
StructField |
The value type of the data type of this field(For example, Int for a StructField with the data type IntegerType) |
StructField(name, dataType [, nullable]). [4](#4) |
|
VariantType |
org.apache.spark.unsafe.type.VariantVal |
VariantType |
|
Not Supported |
Not supported |
Not supported |
Spark SQL data types are defined in the package org.apache.spark.sql.types
. To access or create a data type, use factory methods provided in org.apache.spark.sql.types.DataTypes
.
SQL type |
Data Type |
Value type |
API to access or create data type |
---|---|---|---|
ByteType |
byte or Byte |
DataTypes.ByteType |
|
ShortType |
short or Short |
DataTypes.ShortType |
|
IntegerType |
int or Integer |
DataTypes.IntegerType |
|
LongType |
long or Long |
DataTypes.LongType |
|
FloatType |
float or Float |
DataTypes.FloatType |
|
DoubleType |
double or Double |
DataTypes.DoubleType |
|
DecimalType |
java.math.BigDecimal |
DataTypes.createDecimalType() DataTypes.createDecimalType(precision, scale). |
|
StringType |
String |
DataTypes.StringType |
|
BinaryType |
byte[] |
DataTypes.BinaryType |
|
BooleanType |
boolean or Boolean |
DataTypes.BooleanType |
|
TimestampType |
java.sql.Timestamp |
DataTypes.TimestampType |
|
TimestampNTZType |
java.time.LocalDateTime |
DataTypes.TimestampNTZType |
|
DateType |
java.sql.Date |
DataTypes.DateType |
|
YearMonthIntervalType |
java.time.Period |
YearMonthIntervalType (3) |
|
DayTimeIntervalType |
java.time.Duration |
DayTimeIntervalType (3) |
|
ArrayType |
ava.util.List |
DataTypes.createArrayType(elementType [, containsNull]).(2) |
|
MapType |
java.util.Map |
DataTypes.createMapType(keyType, valueType [, valueContainsNull]).(2) |
|
StructType |
org.apache.spark.sql.Row |
DataTypes.createStructType(fields). fields is a List or array of StructField. [4](#4) |
|
StructField |
The value type of the data type of this field (For example, int for a StructField with the data type IntegerType) |
DataTypes.createStructField(name, dataType, nullable) [4](#4) |
|
VariantType |
org.apache.spark.unsafe.type.VariantVal |
VariantType |
|
Not Supported |
Not supported |
Not supported |
Spark SQL data types are defined in the package pyspark.sql.types
. You access them by importing the package:
from pyspark.sql.types import *
SQL type |
Data type |
Value type |
API to access or create data type |
---|---|---|---|
ByteType |
int or long. (1) |
ByteType() |
|
ShortType |
int or long. (1) |
ShortType() |
|
IntegerType |
int or long |
IntegerType() |
|
LongType |
long (1) |
LongType() |
|
FloatType |
float (1) |
FloatType() |
|
DoubleType |
float |
DoubleType() |
|
DecimalType |
decimal.Decimal |
DecimalType() |
|
StringType |
string |
StringType() |
|
BinaryType |
bytearray |
BinaryType() |
|
BooleanType |
bool |
BooleanType() |
|
TimestampType |
datetime.datetime |
TimestampType() |
|
TimestampNTZType |
datetime.datetime |
TimestampNTZType() |
|
DateType |
datetime.date |
DateType() |
|
YearMonthIntervalType |
Not supported |
Not supported |
|
DayTimeIntervalType |
datetime.timedelta |
DayTimeIntervalType (3) |
|
ArrayType |
list, tuple, or array |
ArrayType(elementType, [containsNull]).(2) |
|
MapType |
dict |
MapType(keyType, valueType, [valueContainsNull]).(2) |
|
StructType |
list or tuple |
StructType(fields). field is a Seq of StructField. (4) |
|
StructField |
The value type of the data type of this field (For example, Int for a StructField with the data type IntegerType) |
StructField(name, dataType, [nullable]).(4) |
|
VariantType |
VariantVal |
VariantType() |
|
Not Supported |
Not supported |
Not supported |
SQL type |
Data type |
Value type |
API to access or create data type |
---|---|---|---|
ByteType |
integer (1) |
‘byte’ |
|
ShortType |
integer (1) |
‘short’ |
|
IntegerType |
integer |
‘integer’ |
|
LongType |
integer (1) |
‘long’ |
|
FloatType |
numeric (1) |
‘float’ |
|
DoubleType |
numeric |
‘double’ |
|
DecimalType |
Not supported |
Not supported |
|
StringType |
character |
‘string’ |
|
BinaryType |
raw |
‘binary’ |
|
BooleanType |
logical |
‘bool’ |
|
TimestampType |
POSIXct |
‘timestamp’ |
|
TimestampNTZType |
datetime.datetime |
TimestampNTZType() |
|
DateType |
Date |
‘date’ |
|
YearMonthIntervalType |
Not supported |
Not supported |
|
DayTimeIntervalType |
Not supported |
Not supported |
|
ArrayType |
vector or list |
list(type=’array’, elementType=elementType, containsNull=[containsNull]).(2) |
|
MapType |
environment |
list(type=’map’, keyType=keyType, valueType=valueType, valueContainsNull=[valueContainsNull]).(2) |
|
StructType |
named list |
list(type=’struct’, fields=fields). fields is a Seq of StructField. (4) |
|
StructField |
The value type of the data type of this field (For example, integer for a StructField with the data type IntegerType) |
list(name=name, type=dataType, nullable=[nullable]).(4) |
|
Not Supported |
Not supported |
Not supported |
|
Not Supported |
Not supported |
Not supported |
(1) Numbers are converted to the domain at runtime. Make sure that numbers are within range.
(2) The optional value defaults to TRUE
.
(3) Interval types
YearMonthIntervalType([startField,] endField)
: Represents a year-month interval which is made up of a contiguous subset of the following fields:startField
is the leftmost field, andendField
is the rightmost field of the type. Valid values ofstartField
andendField
are0(MONTH)
and1(YEAR)
.DayTimeIntervalType([startField,] endField)
: Represents a day-time interval which is made up of a contiguous subset of the following fields:startField
is the leftmost field, andendField
is the rightmost field of the type. Valid values ofstartField
andendField
are0(DAY)
,1(HOUR)
,2(MINUTE)
,3(SECOND)
.
(4) StructType
StructType(fields)
Represents values with the structure described by a sequence, list, or array ofStructField
s (fields). Two fields with the same name are not allowed.StructField(name, dataType, nullable)
Represents a field in aStructType
. The name of a field is indicated byname
. The data type of a field is indicated by dataType.nullable
indicates if values of these fields can havenull
values. This is the default.