Parameter markers
Parameter markers are named or unnamed typed placeholder variables used to supply values from the API invoking the SQL statement.
Using parameter markers protects your code from SQL injection attacks since it clearly separates provided values from the SQL statements.
You cannot mix named and unnamed parameter markers in the same SQL statement.
You must not reference a parameter marker in a DDL statement, such as a generated column or DEFAULT
definition, a view, or a SQL function.
Exceptions are references to parameter markers in the IDENTIFIER
clause, which can be used to parameterize table or column names in certain DDL statements. See IDENTIFIER clause.
Parameter markers can be provided by:
Python using its pyspark.sql.SparkSession.sql() API.
Scala using its org.apache.spark.sql.SparkSession.sql() API.
Java using its org.apache.spark.sql.SparkSession.sql() API.
Named parameter markers
Applies to: Databricks Runtime 12.1 and above
Named parameter markers are typed placeholder variables. The API invoking the SQL statement must supply name-value pairs to associate each parameter marker with a value.
Notes
You can reference the same parameter marker multiple times within the same SQL Statement. If no value has been bound to the parameter marker an UNBOUND_SQL_PARAMETER error is raised. You are not required to reference all supplied parameter markers.
The mandatory preceding :
(colon) differentiates the namespace of named parameter markers from that of column names and SQL parameters.
Examples
The following example defines two parameter markers:
later: An
INTERVAL HOUR
with value 3.x: A
DOUBLE
with value 15.0
x
is referenced multiple times, while later
is referenced once.
> DECLARE stmtStr = 'SELECT current_timestamp() + :later, :x * :x AS square';
> EXECUTE IMMEDIATE stmtStr USING INTERVAL '3' HOURS AS later, 15.0 AS x;
2024-01-19 16:17:16.692303 225.00
import org.apache.spark.sql.SparkSession
val spark = SparkSession
.builder()
.appName("Spark named parameter marker example")
.getOrCreate()
val argMap = Map("later" -> java.time.Duration.ofHours(3), "x" -> 15.0)
spark.sql(
sqlText = "SELECT current_timestamp() + :later, :x * :x AS square",
args = argMap).show()
// +----------------------------------------+------+
// |current_timestamp() + INTERVAL '03' HOUR|square|
// +----------------------------------------+------+
// | 2023-02-27 17:48:...|225.00|
// +----------------------------------------+------+
import org.apache.spark.sql.*;
import static java.util.Map.entry;
SparkSession spark = SparkSession
.builder()
.appName("Java Spark named parameter marker example")
.getOrCreate();
Map<String, String> argMap = Map.ofEntries(
entry("later", java.time.Duration.ofHours(3)),
entry("x", 15.0)
);
spark.sql(
sqlText = "SELECT current_timestamp() + :later, :x * :x AS square",
args = argMap).show();
// +----------------------------------------+------+
// |current_timestamp() + INTERVAL '03' HOUR|square|
// +----------------------------------------+------+
// | 2023-02-27 17:48:...|225.00|
// +----------------------------------------+------+
spark.sql("SELECT :x * :y * :z AS volume", args = { "x" : 3, "y" : 4, "z" : 5 }).show()
// +------+
// |volume|
// +------+
// | 60|
// +------+
Unnamed parameter markers
Applies to: Databricks Runtime 13.3 and above
Unnamed parameter markers are typed placeholder variables. The API invoking the SQL statement must supply an array of arguments to associate each parameter marker with a value in the order in which they appear.
Notes
Each occurrence of an unnamed parameter marker consumes a value provided by the API invoking the SQL statement in order. If no value has been bound to the parameter marker, an UNBOUND_SQL_PARAMETER error is raised. You are not required to consume all provided values.
Examples
The following example defines three parameter markers:
An
INTERVAL HOUR
with value 3.Two
DOUBLE
with value 15.0 each.
Since the parameters are unnamed each provided value is consumed by at most one parameter.
> DECLARE stmtStr = 'SELECT current_timestamp() + ?, ? * ? AS square';
> EXECUTE IMMEDIATE stmtStr USING INTERVAL '3' HOURS, 15.0, 15.0;
2024-01-19 16:17:16.692303 225.00
import org.apache.spark.sql.SparkSession
val spark = SparkSession
.builder()
.appName("Spark unnamed parameter marker example")
.getOrCreate()
val argArr = Array(java.time.Duration.ofHours(3), 15.0, 15.0)
spark.sql(
sqlText = "SELECT current_timestamp() + ?, ? * ? AS square", args = argArr).show()
// +----------------------------------------+------+
// |current_timestamp() + INTERVAL '03' HOUR|square|
// +----------------------------------------+------+
// | 2023-02-27 17:48:...|225.00|
// +----------------------------------------+------+
import org.apache.spark.sql.*;
SparkSession spark = SparkSession
.builder()
.appName("Java Spark unnamed parameter marker example")
.getOrCreate();
Object[] argArr = new Object[] { java.time.Duration.ofHours(3), 15.0, 15.0 }
spark.sql(
sqlText = "SELECT current_timestamp() + ?, ? * ? AS square",
args = argArr).show();
// +----------------------------------------+------+
// |current_timestamp() + INTERVAL '03' HOUR|square|
// +----------------------------------------+------+
// | 2023-02-27 17:48:...|225.00|
// +----------------------------------------+------+
spark.sql("SELECT ? * ? * ? AS volume", args = { 3, 4, 5 }).show()
// +------+
// |volume|
// +------+
// | 60|
// +------+