When running a report that executes against a Databricks database, the following syntax error appears:
JdbcSQLException occur. Error Message: [Strategy][SparkJDBCDriver](500051) ERROR processing query/statement. Error Code: 0, SQL state: 42601, Query: SELECT `a1***, Error message from Server: org.apache.hive.service.cli.HiveSQLException: Error running query: [PARSE_SYNTAX_ERROR] org.apache.spark.sql.catalyst.parser.ParseException:
[PARSE_SYNTAX_ERROR] Syntax error at or near 'JOIN'. SQLSTATE: 42601 (line 1, pos 2537)
The error message also includes the SQL query that fails which is similar to the following one:
SELECT col1,
col2,
col3
FROM `table1` `a11`
JOIN `table2` `a12`
ON ( `a11`.`unidad`=`a12`.`unidad` )
JOIN `table3` `a13`
ON ( `a11`.`flag_act_total`=`a13`.`flag_act_total` )
JOIN `table4` `a14`
ON ( `a11`.`flag_ec_reg`=`a14`.`flag_ec_reg` )
JOIN `table5` `a15`
ON ( `a11`.`flag_mes_ytd_hco`=`a15`.`flag_mes_ytd_hco` )
CROSS JOIN (SELECT unidad, idagregado, descagregad
FROM table6
WHERE (idagregado <> '9999')) `a16`
WHERE ( ( `a11`.`idagregado`=`a16`.`idagregado`
AND `a11`.`unidad`=`a16`.`unidad` ) )
JOIN `table7` `a17`
ON ( `a11`.`flag_publicado`=`a17`.`flag_publicado` )
...
The above query is syntactically not correct because the WHERE clause in the main query appears before a JOIN statement.
When checking the report SQL, the query differs from the above and has a correct syntax:
SELECT col1,
col2,
col3
FROM `table1` `a11`
JOIN `table2` `a12`
ON ( `a11`.`unidad`=`a12`.`unidad` )
JOIN `table3` `a13`
ON ( `a11`.`flag_act_total`=`a13`.`flag_act_total` )
JOIN `table4` `a14`
ON ( `a11`.`flag_ec_reg`=`a14`.`flag_ec_reg` )
JOIN `table5` `a15`
ON ( `a11`.`flag_mes_ytd_hco`=`a15`.`flag_mes_ytd_hco` )
JOIN (SELECT unidad, idagregado, descagregad
FROM table6
WHERE (idagregado <> '9999')) `a16`
ON ( ( `a11`.`idagregado`=`a16`.`idagregado`
AND `a11`.`unidad`=`a16`.`unidad` ) )
JOIN `table7` `a17`
ON ( `a11`.`flag_publicado`=`a17`.`flag_publicado` )
...
This issue occurs because of the parameter UseNativeQuery in the Databricks connection string. When its value is set to 0, the connector transforms the queries emitted by Strategy and converts them into an equivalent form in HiveQL.
By enabling UseNativeQuery, the system directly sends the original SQL to the database without transforming or rewriting it.
However, this parameter can lead to certain behaviors and limitations, especially concerning queries with parameters and bulk operations. Setting EnableNativeParameterizedQuery=0 while using
UseNativeQuery=1can resolve this issue. It ensures that bulk insert operations are preserved even when
UseNativeQueryis enabled.
In short, in order to resolve this syntax issue, add the following parameters to the Databricks connection string: UseNativeQuery=1;EnableNativeParameterizedQuery=0