Friday, 16 December 2016

Connectivity issue with Tableau 10 and MapR DRILL

If you are hitting connectivity issue with Tableau and DRILL with following symptoms, you are probably hitting a bug in Tableau 10.0.1/10.0.2.

Symptoms:

[1] You will see following error when you try to connect 10.0.1/10.0.2 with DRILL with authentication

[2] There will be no issue in connecting Tableau to non-authenticated DRILL.

[3] No issue with DRILL EXPLORER to connect to either authenticated or non-authenticated DRILL BITS.

Fix:

The issue is fixed in Tableau 10.0.3.

Workaround:

Following are the other workarounds:

[1] Try running tableau using command line with command "tableau.exe -DProtocolServerReconnect=1" and see if you are able to connect to driver. The easiest way in windows is to set the properties for a desktop shortcut by adding -DProtocolServerReconnect=1 after the double quotes pointing to the tableau.exe file.

[2] Second way is to make sure that when you try to connect to your driver DSN from tableau, you are not prompted with a connection dialog i.e. you set values for all the driver keys in windows registry.

The issue is observed in case user is presented with connection dialog while connecting to driver from tableau. So, the fix is to suppress this connection dialog. When we try to connect to ODBC data sources from tableau, we can either choose to connect to DSN or Driver. If connect using driver is selected, then connection dialog is always presented and connection fails for some drivers.
If we try to connect using DSN then we can suppress connection dialog by setting all connection properties in windows registry. The connection properties can be set under the registry path: HKEY_LOCAL_MACHINE\SOFTWARE\ODBC\ODBC.INI\ [Driver DSN]

Wednesday, 9 November 2016

Connecting DRILL using ODBC in Linux - Using unixODBC Driver Manager

System requirements can be found in https://drill.apache.org/docs/installing-the-driver-on-linux/

My environment : cat /etc/redhat-release

CentOS release 6.6 (Final)

NOTE : Make sure the hostname and IPs of drill bit nodes are specified in '/etc/hosts'

Step 1 : Installing unixODBC

yum install unixODBC

Step 2 : Download MapR Drill ODBC Driver

cd /home/alwin

wget http://package.mapr.com/tools/MapR-ODBC/MapR_Drill/MapRDrill_odbc_v1.2.1.1000/MapRDrillODBC-1.2.1.x86_64.rpm

Step 3 : Install MapR Drill ODBC Driver

cd /home/alwin

yum localinstall --nogpgcheck MapRDrillODBC-1.2.1.x86_64.rpm

Step 4 : Test the installation

[a] rpm -qa | grep -i mapr

MapRDrillODBC-1.2.1-1.x86_64

[b] rpm -qa | grep -i unixodbc

unixODBC-2.2.14-14.el6.x86_64

Step 5 : Copy the following files in '/opt/mapr/drillodbc/Setup' to HOME directory

· mapr.drillodbc.ini

· odbc.ini

· odbcinst.ini

echo $HOME

/root

cd /opt/mapr/drillodbc/Setup

cp * /root/

Step 6 : Rename the files as hidden files

cd /root

mv mapr.drillodbc.ini .mapr.drillodbc.ini

mv odbc.ini .odbc.ini

mv odbcinst.ini .odbcinst.ini

Step 7 : Set the environment variables

export ODBCINI=~/.odbc.ini

export MAPRDRILLINI=~/.mapr.drillodbc.ini

export LD_LIBRARY_PATH=/usr/local/lib:/opt/mapr/drillodbc/lib/64

Step 8 : Define the ODBC sources in .odbc.ini

Sample used by me is given below:

[ODBC]

Trace=no

[ODBC Data Sources]

Sample MapR Drill DSN 64=MapR Drill ODBC Driver 64-bit

[Sample MapR Drill DSN 64]

# This key is not necessary and is only to give a description of the data source.

Description=MapR Drill ODBC Driver (64-bit) DSN

# Driver: The location where the ODBC driver is installed to.

Driver=/opt/mapr/drillodbc/lib/64/libmaprdrillodbc64.so

# The DriverUnicodeEncoding setting is only used for SimbaDM

# When set to 1, SimbaDM runs in UTF-16 mode.

# When set to 2, SimbaDM runs in UTF-8 mode.

#DriverUnicodeEncoding=2

# Values for ConnectionType, AdvancedProperties, Catalog, Schema should be set here.

# If ConnectionType is Direct, include Host and Port. If ConnectionType is ZooKeeper, include ZKQuorum and ZKClusterID

# They can also be specified on the connection string.

# AuthenticationType: No authentication; Basic Authentication

ConnectionType=Zookeeper

#HOST=[HOST]

#PORT=[PORT]

ZKQuorum=<hostname>:5181

ZKClusterID=ajames-drillbits

AuthenticationType=No Authentication

UID=[USERNAME]

PWD=[PASSWORD]

AdvancedProperties=CastAnyToVarchar=true;HandshakeTimeout=5;QueryTimeout=180;TimestampTZDisplayTimezone=utc;ExcludedSchemas=sys,INFORMATION_SCHEMA;NumberOfPrefetchBuffers=5;

Catalog=DRILL

Schema=

Make sure the following:

[1] /opt/mapr/drillodbc/lib/64/libmaprdrillodbc64.so is present

[2] I used Zookeeper mode of conection, please make sure

a. ConnectionType=Zookeeper

b. ZKQuorum=<same as value specified for 'zk.connect' in /opt/mapr/drill/drill- <version>/conf/drill-override.conf file>

c. ZKClusterID==<same as value specified for 'cluster-id' in /opt/mapr/drill/drill- <version>/conf/drill-override.conf file>

NOTE: I connected to drill-bit which is not authenticated

For authentication, set the following:

AuthenticationType=Basic Authentication

UID=[<userid>]

PWD=[<password>]

Step 9 : Configure the MapR Drill ODBC Driver

Sample of my .mapr.drillodbc.ini is given below

## - Note that this default DriverManagerEncoding of UTF-32 is for iODBC.

## - unixODBC uses UTF-16 by default.

## - If unixODBC was compiled with -DSQL_WCHART_CONVERT, then UTF-32 is the correct value.

## Execute 'odbc_config --cflags' to determine if you need UTF-32 or UTF-16 on unixODBC

## - SimbaDM can be used with UTF-8 or UTF-16.

## The DriverUnicodeEncoding setting will cause SimbaDM to run in UTF-8 when set to 2 or UTF-16 when set to 1.

[Driver]

DisableAsync=0

DriverManagerEncoding=UTF-32

ErrorMessagesPath=/opt/mapr/drillodbc/ErrorMessages

LogLevel=0

LogPath=[LogPath]

SwapFilePath=/tmp

## - Uncomment the ODBCInstLib corresponding to the Driver Manager being used.

## - Note that the path to your ODBC Driver Manager must be specified in LD_LIBRARY_PATH.

# Generic ODBCInstLib

# iODBC

# ODBCInstLib=libiodbcinst.so

# SimbaDM / unixODBC

ODBCInstLib=libodbcinst.so

# AIX specific ODBCInstLib

# iODBC

#ODBCInstLib=libiodbcinst.a(libiodbcinst.so.2)

# SimbaDM

#ODBCInstLib=libodbcinst.a(odbcinst.so)

# unixODBC

#ODBCInstLib=libodbcinst.a(libodbcinst.so.1)

NOTE : Since we are using unixODBC, comment out

# Generic ODBCInstLib

# iODBC

ODBCInstLib=libiodbcinst.so

and uncomment

# SimbaDM / unixODBC

ODBCInstLib=libodbcinst.so

Step 10 : Testing the connection

We can use 'isql' to test the connection

isql "<DSN>"

From my system

isql "Sample MapR Drill DSN 64"

+---------------------------------------+

| Connected! |

| |

| sql-statement |

| help [tablename] |

| quit |

| |

+---------------------------------------+

SQL>

For drill bit node with authentication use:

isql "<DSN>" "<username>" "<password>"

Thursday, 27 October 2016

HBase Table Replication

How HBase Replication Works?

HBase replication is based on Source-Push methodology. This means master pushes the data asynchronously. This asynchronous method will result in eventual consistency of both the tables. In a busy cluster the slave might lag from the master in order of minutes.

The underlying principle of replication is replaying of WALEntries. WALEdit is an object representing one transaction and can have more than one mutation operations (puts/ deletes), but will have for only one row. Please note that writing to WAL is optional for normal HBase operations, however for replication to work WAL must be enabled.

As per https://hbase.apache.org/0.94/apidocs/org/apache/hadoop/hbase/replication/package-summary.html

Before trying out replication, make sure to review the following requirements:

[1] Zookeeper should be handled by yourself, not by HBase, and should always be available during the deployment.

[2] All machines from both clusters should be able to reach every other machine since replication goes from any region server to any other one on the slave cluster. That also includes the Zookeeper clusters.

[3] Both clusters should have the same HBase and Hadoop major revision. For example, having 0.90.1 on the master and 0.90.0 on the slave is correct but not 0.90.1 and 0.89.20100725.

[4] Every table that contains families that are scoped for replication should exist on every cluster with the exact same name, same for those replicated families.

[5] For multiple slaves, Master/Master, or cyclic replication version 0.92 or greater is needed.

Also, if both source and destination cluster uses same zookeeper quorum, then make sure that they use a different 'zookeeper.znode.parent' znode.

Different modes of replication:

[1] Master - Slave (Single direction)
[2] Master - Master (Bi-directional)
[3] Cyclic - more than 2 cluster in picture. Can have various combination of above two.

Enabling replication:

MODE: Master - Slave

Changes to be made on Master: (Make sure HBase table to be replicated exists in both the cluster)

[1] Edit ${HBASE_HOME}/conf/hbase-site.xml on both clusters and add the following:

<property>
<name>hbase.replication</name>
<value>true</value>
</property>

[2] Push hbase-site.xml to all nodes.
[3] Restart hbase
[4] Run the following command in the HBase master's shell while it's running:
add_peer '<n>', "slave.zookeeper.quorum:zookeeper.clientport.:zookeeper.znode.parent"
[5] Once you have a peer, enable replication on your column families. One way to do this is to alter the table and set the scope like this:

disable 'your_table'
alter 'your_table', {NAME => 'family_name', REPLICATION_SCOPE => '1'}
enable 'your_table'

MODE: Master – Master

Make the above changes in both the clusters.

Verifying whether data is replicated or not:

VerifyReplication MR job - runs on the master node and we need to specify the peer id.

$ HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase classpath`

"${HADOOP_HOME}/bin/hadoop" jar "${HBASE_HOME}/hbase-server-VERSION.jar"

verifyrep --starttime=<timestamp> --stoptime=<timestamp> --families=<myFam> <ID> <tableName>

The VerifyReplication command prints out GOODROWS and BADROWS counters to indicate rows that did and did not replicate correctly.

Example:

hadoop jar /opt/mapr/hbase/hbase-1.1.1/lib/hbase-server-1.1.1-mapr-1602.jar verifyrep --starttime=1477346187024 --stoptime 1478346187024 --families=c 1 hreplica

Also, the status of replication can be viewed from the HBase shell using 'status' command.

hbase(main):001:0> status 'replication'

version 1.1.1-mapr-1602

1 live servers

m51-d16-2:

SOURCE: PeerID=1, AgeOfLastShippedOp=0, SizeOfLogQueue=0, TimeStampsOfLastShippedOp=Mon Oct 24 22:53:04 EDT 2016, Replication Lag=0

SINK : AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Oct 26 19:16:57 EDT 2016

MapR cluster status can be viewed using the 'maprcli dashboard info' command or the UI.

Common errors and their reasons:

[A] ERROR when host details are not added in source cluster:

2016-10-24 17:56:32,632 WARN [main-EventThread.replicationSource,1] regionserver.HBaseInterClusterReplicationEndpoint: Can't replicate because of a local or network error:

java.net.UnknownHostException: unknown host: m51-d16-2

at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.<init>(RpcClientImpl.java:301)

at org.apache.hadoop.hbase.ipc.RpcClientImpl.createConnection(RpcClientImpl.java:131)

at org.apache.hadoop.hbase.ipc.RpcClientImpl.getConnection(RpcClientImpl.java:1286)

at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1164)

at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:213)

at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:287)

at org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$BlockingStub.replicateWALEntry(AdminProtos.java:23209)

at org.apache.hadoop.hbase.protobuf.ReplicationProtbufUtil.replicateWALEntry(ReplicationProtbufUtil.java:65)

at org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint.replicate(HBaseInterClusterReplicationEndpoint.java:161)

at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:694)

at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:406)

[B] ERROR while the destination cluster is down:

2016-10-24 17:59:53,040 WARN [main-EventThread.replicationSource,1] regionserver.HBaseInterClusterReplicationEndpoint: Can't replicate because of a local or network error:

java.io.IOException: No replication sinks are available

at org.apache.hadoop.hbase.replication.regionserver.ReplicationSinkManager.getReplicationSink(ReplicationSinkManager.java:115)

at org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint.replicate(HBaseInterClusterReplicationEndpoint.java:155)

at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:694)

at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:406)

[C] ERROR when table does not exist in destination cluster as in source cluster:

2016-10-24 17:59:16,036 WARN [main-EventThread.replicationSource,1] regionserver.HBaseInterClusterReplicationEndpoint: Can't replicate because of an error on the remote cluster:

org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException): org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 action: Table 'hreplica' was not found, got: hbase:namespace.: 1 time,

at org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:229)

at org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1700(AsyncProcess.java:209)

at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.getErrors(AsyncProcess.java:1595)

at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:1185)

at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:1202)

at org.apache.hadoop.hbase.replication.regionserver.ReplicationSink.batch(ReplicationSink.java:236)

at org.apache.hadoop.hbase.replication.regionserver.ReplicationSink.replicateEntries(ReplicationSink.java:160)

at org.apache.hadoop.hbase.replication.regionserver.Replication.replicateLogEntries(Replication.java:198)

at org.apache.hadoop.hbase.regionserver.RSRpcServices.replicateWALEntry(RSRpcServices.java:1708)

at org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:22253)

at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114)

at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)

at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)

at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)

at java.lang.Thread.run(Thread.java:745)

at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1208)

at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:213)

at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:287)

at org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$BlockingStub.replicateWALEntry(AdminProtos.java:23209)

at org.apache.hadoop.hbase.protobuf.ReplicationProtbufUtil.replicateWALEntry(ReplicationProtbufUtil.java:65)

at org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint.replicate(HBaseInterClusterReplicationEndpoint.java:161)

at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:694)

at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:406)

[D] ERROR when regionserver is down/ table is disabled:

2016-10-24 18:19:28,877 WARN [main-EventThread.replicationSource,1] regionserver.HBaseInterClusterReplicationEndpoint: Can't replicate because of an error on the remote cluster: