Hive table
This article shows how to import a Hive table from cloud storage into Databricks using an external table.
Step 1: Show the CREATE TABLE
statement
Issue a SHOW CREATE TABLE <tablename>
command on your Hive command line to see the statement that created the table.
hive> SHOW CREATE TABLE wikicc;
OK
CREATE TABLE `wikicc`(
`country` string,
`count` int)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'<path-to-table>'
TBLPROPERTIES (
'totalSize'='2335',
'numRows'='240',
'rawDataSize'='2095',
'COLUMN_STATS_ACCURATE'='true',
'numFiles'='1',
'transient_lastDdlTime'='1418173653')
Step 2: Issue a CREATE EXTERNAL TABLE
statement
If the statement that is returned uses a CREATE TABLE
command, copy the statement and replace CREATE TABLE
with CREATE EXTERNAL TABLE
.
EXTERNAL
ensures that Spark SQL does not delete your data if you drop the table.You can omit the
TBLPROPERTIES
field.
DROP TABLE wikicc
CREATE EXTERNAL TABLE `wikicc`(
`country` string,
`count` int)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'<path-to-table>'