Monday 26 February 2018

How do I enable HBase audit logs to monitor who modifies my HBase table data?


Aim:

This article discusses how to enable HBase audit logs. This will help you to track who made changes to the HBase table data. 

Solution:

Please follow the below steps:

[1] Add the following to 'hbase-site.xml' file in all HBase regionserver nodes.
File Location (Specific to MapR clusters):  /opt/mapr/hbase/hbase-<version>/conf/

<property>
     <name>hbase.security.authorization</name>
     <value>true</value>
</property>
<property>
     <name>hbase.coprocessor.master.classes</name>
     <value>org.apache.hadoop.hbase.security.access.AccessController</value>
</property>
<property>
     <name>hbase.coprocessor.region.classes</name>
     <value>org.apache.hadoop.hbase.security.token.TokenProvider,org.apache.hadoop.hbase.security.access.AccessController</value>
</property>


[2] Uncomment/Add the following to  'log4j.properties ' file in all HBase regionserver nodes.
File Location (Specific to MapR clusters):  /opt/mapr/hbase/hbase-<version>/conf/

log4j.logger.SecurityLogger.org.apache.hadoop.hbase.security.access.AccessController=TRACE

[3] Restart HBase regionserver service.

You will see the details inside 'SecurityAuth.audit' file.
Default location of 'SecurityAuth.audit' file (Specific to MapR clusters): /opt/mapr/hbase/hbase-<version>/logs/

Example TRACE for 'mapr' user inserting data to table 'tb':

2018-02-26 14:18:17,220 TRACE SecurityLogger.org.apache.hadoop.hbase.security.access.AccessController: Access allowed for user mapr; reason: Table permission granted; remote address: ; request: put; context: (user=mapr, scope=default:tb, family=cf:v, params=[table=default:tb,family=cf:v],action=WRITE)

Thursday 15 February 2018

Sqoop export job getting stuck and failing on task timeout


AIM:


Diagnose an issue where the user had a Sqoop job stuck for a long time and eventually getting failed.

Details:

In this particular case, the user was trying to export data from Hive table to Oracle using Sqoop. 
The job was getting stuck and failing after 40+ minutes.

Analyze the application log for the failed application. For this case, following exception was reported in the log:

2018-02-14 22:02:03,731 ERROR [Thread-7] org.apache.sqoop.mapreduce.AsyncSqlOutputFormat: Got exception in update thread: java.sql.SQLIntegrityConstraintViolationException: ORA-00001: unique constraint (ALWIN.SYS_C009282) violated

        at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:440)
        at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:396)
        at oracle.jdbc.driver.T4C8Oall.processError(T4C8Oall.java:837)
        at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:445)
        at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:191)
        at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:523)
        at oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:207)
        at oracle.jdbc.driver.T4CPreparedStatement.executeForRows(T4CPreparedStatement.java:1010)
        at oracle.jdbc.driver.OraclePreparedStatement.executeForRowsWithTimeout(OraclePreparedStatement.java:10096)
        at oracle.jdbc.driver.OraclePreparedStatement.executeBatch(OraclePreparedStatement.java:10200)
        at oracle.jdbc.driver.OracleStatementWrapper.executeBatch(OracleStatementWrapper.java:216)
        at org.apache.sqoop.mapreduce.AsyncSqlOutputFormat$AsyncSqlExecThread.run(AsyncSqlOutputFormat.java:231)

The export failed because the source data contained duplicate values for a column with a UNIQUE constraint in Oracle.
In general, export will fail if any constraint is violated by the data.

The task is supposed to fail as soon as the above exception is encountered. However, the task went to an unresponsive state. Eventually, the task will be killed after the task timeout value is reached, which defaults to 10 minutes. After the first task is killed, application master will retry the task 3 more times before application is reported as failed. Each task will take 10 minutes (by default) to get timed out, hence adding to a total of 40 minutes.

Root Cause:

The root cause is a bug in Sqoop. In class AsyncSqlRecordWriter, if an exception is thrown in its close method, the Hadoop MapTask will call this close method once more in case it hasn't been closed. Following JIRA contains more information:
https://issues.apache.org/jira/browse/SQOOP-2343

Solution:

Although this is a bug in Sqoop which is fixed in Sqoop 1.4.7 version, ultimately the source data need to cleansed to be compatible with the constraints specified in the destination database. In this particular case, duplicate data for the specific column had to be filtered out from the source table. 

Hive - How to increase memory for Hive CLI, Hiveserver2, and Hivemetastore

You might encounter various cases where Hive CLI or Hiveserver2 or Hivemetastore is running out of memory. You will see following exception in metastore/hiveserver2 logs if they are running out of memory:

java.lang.OutOfMemoryError: Java heap space

As part of cluster planning, you might need to increase memory to one or more of the above services. If you want to increase memory for all the services, then specify the following property in 'hive-env.sh' file with required memory in all nodes where hiverserver2, metatstore is installed and from where Hive CLI will be called.

export HADOOP_CLIENT_OPTS="-Xmx<memory in MB>m"

This will require a restart of hiveserver2 and hivemetastore services.

However, if you want to increase memory for specific service, then you need to edit a specific part of 'hive-env.sh' file.

Case 1: Increase memory for Hive CLI


Add the following in 'hive-env.sh' file on nodes where Hive CLI will be used. When you reconnect to Hive CLI, it will take the new value.  

if [ "$SERVICE" = "cli" ]; then
  if [ -z "$DEBUG" ]; then
    export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=12 -Xmx<memory in MB>m -Xms10m -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:+UseParNewGC -XX:-UseGCOverheadLimit"
  else
    export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=12 -Xmx<memory in MB>m -Xms10m -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:-UseGCOverheadLimit"
  fi
fi

Case 2: Increase memory for HiveServer2


Add the following in 'hive-env.sh' file on all nodes where Hiveserver2 service is installed. This will require a restart of hiveserver2 service. 

if [ "$SERVICE" = "hiveserver2" ]; then
 if [ -z "$DEBUG" ]; then
    export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=12 -Xmx<memory in MB>m -Xms10m -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:+UseParNewGC -XX:-UseGCOverheadLimit"
  else
    export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=12 -Xmx<memory in MB>m -Xms10m -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:-UseGCOverheadLimit"
  fi
fi

Case 3: Increase memory for HiveMetastore


Add the following in 'hive-env.sh' file on all nodes where hivemetastore service is installed. This will require a restart of metastore service. 


if [ "$SERVICE" = "metastore" ]; then
 if [ -z "$DEBUG" ]; then
    export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=12 -Xmx<memory in MB>m -Xms10m -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:+UseParNewGC -XX:-UseGCOverheadLimit"
  else
    export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=12 -Xmx<memory in MB>m -Xms10m -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:-UseGCOverheadLimit"
  fi
fi

DRILL ON YARN - How to collect Drillbit log file

This blog discusses how to collect drillbit logs for troubleshooting.  We will focus on collecting logs specifically for DoY (Drill on Yarn).

There can be two scenarios where you end up collecting logs:
[1] After the application master is stopped.
[2] When the application master is still up and running. (Drillbit cluster is up and running.)

In scenario 1, collecting drillbit log is as simple as collecting the Yarn aggregated logs. Drillbit logs will be present in the application logs. For this to take effect, make sure that Yarn log aggregation is enabled in your cluster. You can execute the following command to collect the logs:

yarn logs -applicationId <application Id> [options] > <destination>
In scenario 2, since the application has not finished yet, the above command will not work. In this case, we need to collect 'Drillbit.log' file from each container launched for this application. This is not a straightforward process since it involves identifying the container IDs and then copying the file to the desired location from every node where drill bits are launched. 

Following script will help in such scenario. Copy the script and save as 'collect-doy-drillbitlog.sh'.  You can find the help by executing following command:

The script is also available in my Github repository - https://github.com/jamealwi2/doy-log-collector

collect-doy-drillbitlog.sh -h

#!/bin/bash

usage(){
echo "Please see usage below:"
echo "collect-doy-drillbitlog.sh -a <application id> -d <copy destination>"
echo "collect-doy-drillbitlog.sh -a <application id> (Default copy destination is '/tmp'.)"
}

while getopts 'ha:d:' option; do
        case "$option" in
                h) usage
                        exit
                        ;;
                a) application_id=$OPTARG
                        ;;
                d) copy_location=$OPTARG
                        ;;
        esac
done;

if [ -z $application_id ]; then
        echo "Please specify the application ID."
usage
        exit
fi

if [ -z $copy_location ]; then
        echo "Copy destination location not specified. The log will be copied to '/tmp'."
        echo -n "Please confirm: (y/n)? "
        read answer
        if echo "$answer" | grep -iq "^y" ;then
                copy_location="/tmp"
        else
                exit
        fi
fi

ls /opt/mapr/hadoop/hadoop-2.7.0/logs/userlogs/${application_id}/*/* |grep drillbit.log | while read line
do container_id=$(echo $line | awk -F"/" '{print $(NF-1)}')
cp $line /${copy_location}/`hostname`-${container_id}-drillbit.log
done;

Monday 5 February 2018

Hue - Hive Issue: User hitting 'org.apache.hadoop.security.AccessControlException'



Diagnostics:


ERROR that is thrown in Hue UI:

java.lang.RuntimeException: org.apache.hadoop.security.AccessControlException: User USERONE(user id USERONE-ID)  has been denied access to create 65d91fd0-8bf4-4901-926b-11efd107725a'.

Since the issue is related to Hue - Hive editor, try to connect HiveServer2 from beeline.
This was also not successful. 

ERROR observed in beeline:

 
Error: Could not open client transport with JDBC Uri: jdbc:hive2://<HS2_HOSTNAME>:10000/default: Failed to open new session: java.lang.RuntimeException: org.apache.hadoop.security.AccessControlException: User USERONE(user id USERONE-ID)  has been denied access to create 7ed6cddc-0606-49ca-81a6-06901e7efc9a (state=08S01,code=0)
java.sql.SQLException: Could not open client transport with JDBC Uri: jdbc:hive2://<HS2_HOSTNAME>:10000/default: Failed to open new session: java.lang.RuntimeException: org.apache.hadoop.security.AccessControlException: User USERONE(user id USERONE-ID)  has been denied access to create 7ed6cddc-0606-49ca-81a6-06901e7efc9a
        at org.apache.hive.jdbc.HiveConnection.<init>(HiveConnection.java:210)
        at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:107)
        at java.sql.DriverManager.getConnection(DriverManager.java:664)
        at java.sql.DriverManager.getConnection(DriverManager.java:208)
        at org.apache.hive.beeline.DatabaseConnection.connect(DatabaseConnection.java:145)
        at org.apache.hive.beeline.DatabaseConnection.getConnection(DatabaseConnection.java:209)
        at org.apache.hive.beeline.Commands.connect(Commands.java:1469)
        at org.apache.hive.beeline.Commands.connect(Commands.java:1364)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hive.beeline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:52)
        at org.apache.hive.beeline.BeeLine.execCommandWithPrefix(BeeLine.java:1104)
        at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:1143)
        at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:976)
        at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:886)
        at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:502)
        at org.apache.hive.beeline.BeeLine.main(BeeLine.java:485)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: org.apache.hive.service.cli.HiveSQLException: Failed to open new session: java.lang.RuntimeException: org.apache.hadoop.security.AccessControlException: User USERONE(user id USERONE-ID)  has been denied access to create 7ed6cddc-0606-49ca-81a6-06901e7efc9a
        at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:267)
        at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:258)
        at org.apache.hive.jdbc.HiveConnection.openSession(HiveConnection.java:583)
        at org.apache.hive.jdbc.HiveConnection.<init>(HiveConnection.java:187)
        ... 24 more
Caused by: org.apache.hive.service.cli.HiveSQLException: Failed to open new session: java.lang.RuntimeException: org.apache.hadoop.security.AccessControlException: User USERONE(user id USERONE-ID)  has been denied access to create 7ed6cddc-0606-49ca-81a6-06901e7efc9a
        at org.apache.hive.service.cli.session.SessionManager.createSession(SessionManager.java:328)
        at org.apache.hive.service.cli.session.SessionManager.openSession(SessionManager.java:271)
        at org.apache.hive.service.cli.CLIService.openSessionWithImpersonation(CLIService.java:189)
        at org.apache.hive.service.cli.thrift.ThriftCLIService.getSessionHandle(ThriftCLIService.java:423)
        at org.apache.hive.service.cli.thrift.ThriftCLIService.OpenSession(ThriftCLIService.java:312)
        at org.apache.hive.service.rpc.thrift.TCLIService$Processor$OpenSession.getResult(TCLIService.java:1377)
        at org.apache.hive.service.rpc.thrift.TCLIService$Processor$OpenSession.getResult(TCLIService.java:1362)
        at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
        at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
        at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
        at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.security.AccessControlException: User USERONE(user id USERONE-ID)  has been denied access to create 7ed6cddc-0606-49ca-81a6-06901e7efc9a
        at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:89)
        at org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36)
        at org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595)
        at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59)
        at com.sun.proxy.$Proxy25.open(Unknown Source)
        at org.apache.hive.service.cli.session.SessionManager.createSession(SessionManager.java:319)
        ... 13 more
Caused by: java.lang.RuntimeException: org.apache.hadoop.security.AccessControlException: User USERONE(user id USERONE-ID)  has been denied access to create 7ed6cddc-0606-49ca-81a6-06901e7efc9a
        at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:591)
        at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:526)
        at org.apache.hive.service.cli.session.HiveSessionImpl.open(HiveSessionImpl.java:168)
        at sun.reflect.GeneratedMethodAccessor113.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78)
        ... 21 more
Caused by: org.apache.hadoop.security.AccessControlException: User USERONE(user id USERONE-ID)  has been denied access to create 7ed6cddc-0606-49ca-81a6-06901e7efc9a
        at com.mapr.fs.MapRFileSystem.makeDir(MapRFileSystem.java:1256)
        at com.mapr.fs.MapRFileSystem.mkdirs(MapRFileSystem.java:1276)
        at org.apache.hadoop.hive.ql.session.SessionState.createPath(SessionState.java:735)
        at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:658)
        at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:563)
        ... 27 more


 Root Cause:


From "at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:658)", it looked like there is some issue with the Hive scratch directory.
 
hdfsSessionPath = new Path(hdfsScratchDirURIString, sessionId);
createPath(conf, hdfsSessionPath, scratchDirPermission, false, true);

From 'org/apache/hadoop/hive/conf/HiveConf.java',
 
SCRATCHDIR("hive.exec.scratchdir", "/user/" + System.getProperty("user.name") + "/tmp/hive/",
        "HDFS root scratch dir for Hive jobs which gets created with write all (777) permission. " +
        "For each connecting user, an HDFS scratch dir: ${hive.exec.scratchdir}/<username> is created, " +
        "with ${hive.scratch.dir.permission}."),

On checking the scratch directory for this user,
 
drwxr-xr-x   - <WRONG_USER>      <WRONG_USER_GROUP>               0 2018-01-06 15:49 /user/mapr/tmp/hive/USERONE

The directory user and group ownership were wrong.

Solution:

Provide correct user and group ownership to the directory.
(Simple way is to delete the scracth directory. Hive will automatically create it when the next operation comes in from the user.)