The Cloudera ODBC drivers allow users to create connections to secured CDH clusters (Hive and Impala) using MIT Kerberos. This is a fixed credential authentication option (not to be confused with the delegated Kerberos credentials through and through authentication available for some databases with Strategy).
NOTE:
- In Strategy Intelligence Server 9.3.x, users need to use the Cloudera v2 ODBC drivers to be able to connect to the secure cluster using Kerberos. In Strategy Intelligence Server 9.4.x, users can use the Cloudera v2 or v2.5 ODBC drivers to the connect to the secure cluster.
- The Cloudera ODBC drivers may require additional supporting libraries such as cyrus-sasl-gssapi on Linux to work correctly. Users should contact Cloudera support with any questions about package dependencies.
- In the documentation below it is assumed that the user has already setup the secure CDH cluster and created a Kerberos user principal account to create the ODBC connection with.
STEPS:
- On the Intelligence Server machine users should create the krb5.conf file needed to point the client to the KDC and provide information about the realm used. An example of the krb5.conf file is below, users must configure the file according to their own network and realm names:
[libdefaults]
default_realm = CDH41-SECURE-LOCAL
dns_lookup_realm = false
dns_lookup_kdc = false
ticket_lifetime = 24h
renew_lifetime = 7d
forwardable = true
[realms]
CDH41-SECURE-LOCAL = {
kdc = cdh41-secure
admin_server = cdh41-secure
}
[domain_realm]
.cdh41-secure = CDH41-SECURE-LOCAL
cdh41-secure = CDH41-SECURE-LOCAL
- The krb5.conf file should be placed in a location that is accessible to the account that the Intelligence Server is run under, and the file permissions should allow access.
- Set an environment variable 'KRB5_CONFIG' pointing to the location of the krb5.conf file. The export command can be run from the bash shell as given below:
export KRB5_CONFIG=/var/opt/Strategy/Kerberos/krb5.conf
- Users should make sure that the Intelligence Server machine is configured to correctly resolve the names of the kdc and realms specified in the krb5.conf file, for example by adding these names to the /etc/hosts file if they cannot be resolved by DNS.
- Users would need to generate or use an existing keytab file containing the credentials for the kerberos user principal that they wish to connect to the secure CDH cluster using. Users then need to specify the location of the keytab file using the environment variable 'KRB5_KTNAME' (example below using the bash shell):
export KRB5_KTNAME=/var/opt/Strategy/Kerberos/hive.keytab
Note: Use of the keytab file is optional. If a keytab file is not used, when running the kinit command, the user is prompted to enter a password for the user for which the kinit command is run.
- With the environment variables set test to make sure that the kerberos tickets can be obtained (in this example the user principal used is hive/cdh41-secure@CDH41-SECURE-LOCAL):
$ kinit -r 1d -k hive/cdh41-secure@CDH41-SECURE-LOCAL
kvno hive/cdh41-secure@CDH41-SECURE-LOCAL
If this succeeded, the command '/usr/kerberos/bin/klist -ef' should display two tickets - one for the krbtgt and one for the hive/cdh41-secure account.
klist -ef
Ticket cache: FILE:/tmp/krb5cc_10256
Default principal: hive/cdh41-secure@CDH41-SECURE-LOCAL
Valid starting Expires Service principal
03/29/13 11:49:28 03/30/13 11:49:27 krbtgt/CDH41-SECURE-LOCAL@CDH41-SECURE-LOCAL
renew until 03/29/13 11:49:28, Flags: FRI
Etype (skey, tkt): Triple DES cbc mode with HMAC/sha1, Triple DES cbc mode with HMAC/sha1
03/29/13 11:52:10 03/30/13 11:49:27 hive/cdh41-secure@CDH41-SECURE-LOCAL
renew until 03/29/13 11:49:28, Flags: FRT
Etype (skey, tkt): Triple DES cbc mode with HMAC/sha1, Triple DES cbc mode with HMAC/sha1
Kerberos 4 ticket cache: /tmp/tkt0
klist: You have no tickets cached
- The DSN configuration in the odbc.ini and other files is dependent on the ODBC driver version being used (v2.0 or v2.5).
Cloudera driver v2.0
1. The DSN needs to be added to the odbc.ini as given below. The full path to the Cloudera ODBC driver, host, and database entries need to be filled as per the user environment.
[ODBC Data Sources]
HIVE_KERBEROS=Cloudera Driver
[ODBC]
Trace=1
TraceFile=/home/MSTR/LINUX/94_94l/cloudera1odbctrace.out
TraceDll=/home/MSTR/LINUX/94_94l/install/lib32/MYtrc32.so
InstallDir=/home/MSTR/LINUX/94_94l/install
IANAAppCodePage=106
UseCursorLib=0
[HIVE_KERBEROS]
Driver=<Full_PATH_TO_CLOUDERA_ODBC_DRIVER>/lib/libhiveodbc.so.1
Description=Cloudera Driver
DATABASE=DB_NAME
HOST=HOST_NAME
PORT=21050
FRAMED=0
#Type=
PRINCIPAL=Hive/cdh41-secure@CDH44-SECURE-LOCAL
2. The following entry should be added to the odbcinst.ini file. This file is available in the same location as the odbc.ini file.
[Cloudera ODBC Driver for Apache Hive]
Driver=<Full_PATH_TO_CLOUDERA_ODBC_DRIVER>/lib/libhiveodbc.so.1
Description=Hive Driver
Setup=<Full_PATH_TO_CLOUDERA_ODBC_DRIVER>/lib/libhiveodbc.so.1
APILevel=2
ConnectFunctions=YYY
DriverODBCVer=1.0
FileUsage=0
SQLLevel=1
If an entry for the Cloudera ODBC Driver for Apache Hive is not listed in the header section, the following entry should be added in the section
[ODBC Drivers]
Cloudera ODBC Driver for Apache Hive=Installed
3.a For Hive, edit the ODBC.sh file found in the location/env and add the following section to the end of the file (or edit if this section is already present). Replace the as per the location where the driver is installed.
#
# ODBC Driver for Hive
#
HIVE_CONFIG='Full_PATH_TO_CLOUDERA_ODBC_DRIVER/cloudera/20v2’
if [ "${HIVE_CONFIG}" != '<HIVE_CONFIG>' ]; then
export HIVE_CONFIG
mstr_append_path LD_LIBRARY_PATH "${HIVE_CONFIG:?}"/lib
export LD_LIBRARY_PATH
fi
3.b For Impala, edit the ODBC.sh file found in the location/env and add the following section to the end of the file (or edit if this section is already present). Replace the as per the location where the driver is installed.
#
# ODBC Driver for odbc20v2 Impala
#
IMPALA_CONFIG=FULL_PATH_TO_DRIVER/lib'
if [ "${IMPALA_CONFIG}" != '<IMPALA_CONFIG>' ]; then
export IMPALA_CONFIG
mstr_append_path LD_LIBRARY_PATH "${IMPALA_CONFIG:?}"/
export LD_LIBRARY_PATH
fi
Cloudera driver v2.5
Refer to the following Strategy Knowledge Base documents for information on how to setup connectivity to Hive and Impala using Cloudera v2.5 ODBC drivers.
KB46929: In Strategy Analytics Enterprise 9.4.1, how to establish connectivity to Hive using the Cloudera v2.5 ODBC driver
KB46931: In Strategy Analytics Enterprise 9.4.1, how to establish connectivity to Impala using the Cloudera v2.5 ODBC driver
A sample odbc.ini for connecting to Impala has been provided below.
[Sample Cloudera Impala DSN 32]
Description=Cloudera ODBC Driver for Apache Impala
Driver=/usr0/cloudera/2513impala/impalaodbc/lib/32/libclouderaimpalaodbc32.so
DriverUnicodeEncoding=2
HOST=cdh-secure
PORT=21050
Database=testdb1
AuthMech=1
KrbFQDN=cdh-secure
KrbRealm=CDH-SECURE-LOCAL
KrbServiceName=impala
TSaslTransportBufSize=1000
RowsFetchedPerBlock=1000
SocketTimeout=0
StringColumnLength=32767
UseNativeQuery=0
- Users should now be able to connect to the secure CDH cluster using the Strategy query tools as usual. Users may need to set the KRB5_CONFIG and KRB5_KTNAME environment variables permanently in their profile to avoid having to set this each time they use Strategy.
Refer to the required Strategy Knowledge Base document listed below, depending on the distribution being used:
KB37931: How to configure the Cloudera Connector on Linux for connectivity to a Hive 0.7 database
KB43595: How to setup connectivity to Cloudera Impala 1.0 in Strategy 9.3.1
KB43744: How to setup connectivity to Cloudera Hive 0.10 in Strategy 9.3.1
KB43150