Bigdata In Our Palm: March 2017

Wednesday, 29 March 2017

Monitoring HBase using MapR SpyGlass

MapR 5.2 comes with SpyGlass. This feature can be used to gather many useful information for monitoring. Following explains how ti setup a sample dashboard for HBase monitoring in Grafana.

For more details on SpyGlass, refer http://maprdocs.mapr.com/home/AdministratorGuide/Monitoring.html

Steps to setup HBase monitoring dashboard in Grafana

Step 1: Login to Grafana

Step 2: Go to node dashboard (I have a single node)

Step 3: Add Panel -> Add Graph

Step 4: Click on General, add a name for the graph, here I gave ‘HBaseRegionServerTracker’

Step 5: Click on ‘Metrics’, in the ‘Metric’ column search for mapr.hbase* metrics and add suitable metric you need.

Step 6: I selected ‘mapr.hbase_master.region_servers’ and then save the dashboard

Monday, 13 March 2017

Accessing data from a pre-existing HBase table through Hive in MapR cluster

AIM

Convert a pre-existing table to Hive-Hbase table and access data making use of 'group by' operation.

Make the following changes in all the nodes in hive-site.xml:

<value>file:///opt/mapr/hive/hive-<version>/lib/hive-hbase-handler-<version>-mapr.jar,

file:///opt/mapr/hbase/hbase-<version>/lib/hbase-client-<version>-mapr.jar, file:///opt/mapr/hbase/hbase-

<version>/lib/hbase-server-<version>-mapr.jar,file:///opt/mapr/hbase/hbase-<version>/lib/hbase-protocol-<version>-

mapr.jar,file:///opt/mapr/zookeeper/zookeeper-<version>/zookeeper-<version>.jar</value>

<description>A comma separated list (with no spaces) of the jar files required for Hive-HBase integration</description>

</property>

<name>hbase.zookeeper.quorum</name>

<description>A comma separated list (with no spaces) of the IP addresses of all ZooKeeper servers in the

cluster.</description>

</property>

<name>hbase.zookeeper.property.clientPort</name>

<description>The Zookeeper client port. The MapR default clientPort is 5181.</description>

</property>

Once above steps are done, let's concentrate on the Hive-Hbase integration

HBase Side:

[1] Create Hbase table:

hbase(main):006:0> create 'myhbase','cfg1'

0 row(s) in 1.2330 seconds

=> Hbase::Table - myhbase

hbase(main):007:0> put 'myhbase','200','cfg1:val','right'

0 row(s) in 0.1260 seconds

[2] Insert data to HBase table

hbase(main):008:0> put 'myhbase','230','cfg1:val','left'

0 row(s) in 0.0100 seconds

hbase(main):009:0> scan 'myhbase'

ROW COLUMN+CELL

200 column=cfg1:val, timestamp=1489427368644, value=right

230 column=cfg1:val, timestamp=1489427381590, value=left

2 row(s) in 0.0480 seconds

hbase(main):010:0> put 'myhbase','2300','cfg1:val','left'

0 row(s) in 0.0240 seconds

hbase(main):011:0> scan 'myhbase'

ROW COLUMN+CELL

200 column=cfg1:val, timestamp=1489427368644, value=right

230 column=cfg1:val, timestamp=1489427381590, value=left

2300 column=cfg1:val, timestamp=1489427494185, value=left

3 row(s) in 0.0170 seconds

Hive Side:

[1] Create Hive-Hbase table:

hive> CREATE EXTERNAL TABLE hbase_table_4(key int, value string)

> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'

> WITH SERDEPROPERTIES ("hbase.columns.mapping" = "cfg1:val")

> TBLPROPERTIES("hbase.table.name" = "myhbase");

[2] Check for data in Hive table

hive> select * from hbase_table_4;

200 right

230 left

2300 left

Time taken: 0.172 seconds, Fetched: 3 row(s)

[3] Example with 'group by':

hive> select value,count(value) from hbase_table_4 group by value;

Query ID = mapr_20170313135543_574ef721-612f-4705-ab6c-1ea56f808f47

Total jobs = 1

Launching Job 1 out of 1

Number of reduce tasks not specified. Estimated from input data size: 1

In order to change the average load for a reducer (in bytes):

set hive.exec.reducers.bytes.per.reducer=<number>

In order to limit the maximum number of reducers:

set hive.exec.reducers.max=<number>

In order to set a constant number of reducers:

set mapreduce.job.reduces=<number>

Starting Job = job_1489190722556_0018, Tracking URL = http://vm51-154:8088/proxy/application_1489190722556_0018/

Kill Command = /opt/mapr/hadoop/hadoop-2.7.0/bin/hadoop job -kill job_1489190722556_0018

Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1

2017-03-13 13:55:55,743 Stage-1 map = 0%, reduce = 0%

2017-03-13 13:56:09,321 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.84 sec

2017-03-13 13:56:17,702 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 5.4 sec

MapReduce Total cumulative CPU time: 5 seconds 400 msec

Ended Job = job_1489190722556_0018

MapReduce Jobs Launched:

Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 5.4 sec MAPRFS Read: 0 MAPRFS Write: 0 SUCCESS

Total MapReduce CPU Time Spent: 5 seconds 400 msec

left 2

right 1

Time taken: 35.143 seconds, Fetched: 2 row(s)

Thursday, 9 March 2017

Hue Unable to retrieve logs when executing Hive query from Hue editor

Issue

Hue Unable to retrieve logs when executing Hive query from Hue editor. Hue version is 3.9 and Hive version is 1.2. In the Hue UI, it throws ‘Invalid method name: 'GetLog'’ error.

Error in runscpserver.log

TApplicationException: Invalid method name: 'GetLog'

[09/Mar/2017 14:39:04 -0800] thrift_util INFO Thrift saw an application exception: Invalid method name: 'GetLog'

[09/Mar/2017 14:39:04 -0800] hive_server2_lib ERROR server does not support GetLog

Traceback (most recent call last):

File "/opt/mapr/hue/hue-3.9.0/apps/beeswax/src/beeswax/server/hive_server2_lib.py", line 750, in get_log

res = self.call(self._client.GetLog, req)

File "/opt/mapr/hue/hue-3.9.0/apps/beeswax/src/beeswax/server/hive_server2_lib.py", line 555, in call

res = fn(req)

File "/opt/mapr/hue/hue-3.9.0/desktop/core/src/desktop/lib/thrift_util.py", line 377, in wrapper

raise StructuredException('THRIFTAPPLICATION', str(e), data=None, error_code=502

Solution

Check whether ‘use_get_log_api’ is set to ‘true’ in hue.ini file. Comment it out or make it ‘false’ and restart Hue.

Wednesday, 8 March 2017

Configuring HBase Thrift HA in MapR Clusters

AIM

Make HBase thrift HA with multiple thrift servers as ACTIVE.

Default behavior

If we have hbase thrift installed in to nodes, one will be shown as active and the other as standby.

Steps

Make changes in following file in all the hbase thrift server nodes

/opt/mapr/conf/conf.d/warden.hbasethrift.conf

# sed -i s/\1.1.1/`cat /opt/mapr/hbase/hbaseversion`/g /opt/mapr/conf/conf.d/warden.hbasethrift.conf

services=hbasethrift:all

service.displayname=HBaseThriftServer

service.command.start=/opt/mapr/hbase/hbase-1.1.1/bin/hbase-daemon.sh start thrift

service.command.stop=/opt/mapr/hbase/hbase-1.1.1/bin/hbase-daemon.sh stop thrift

service.command.type=BACKGROUND

service.command.monitorcommand=/opt/mapr/hbase/hbase-1.1.1/bin/hbase-daemon.sh status thrift

service.port=9090

service.ui.port=9095

service.logs.location=/opt/mapr/hbase/hbase-1.1.1/logs

service.process.type=JAVA

service.alarm.tersename=hbasethrift

service.alarm.label=HbaseThriftServiceDown

Once the changes are made, restart warden in all nodes

service mapr-warden restart

Once the warden comes up, you will see all the thrift servers in active state.

I have installed hbase thrift in 2 nodes. Below MCS screenshot shows that hbase thrift is up and running in two nodes.

Wednesday, 1 March 2017

Oozie workflow shows in running state even after job is completed successfully

Issue:

Oozie workflow shows to be in running state even after the job has completed successfully. oozie job -info <workflow_ID> will show that job has completed successfully. However in web UI it still remains as running.
If we try to kill the workflow in oozie using oozie job -kill <workflow_ID>, it will throw the following error:

Error: E0607 : E0607: Other error in operation [kill], java.io.EOFException

In the oozie.log you can find following exception: (exception captured while trying to suspend the job)

2017-03-01 15:10:59,753 WARN V2JobServlet:523 - SERVER[phpvcoredev03.chicago.local] USER[vcoredevuser] GROUP[-] TOKEN[] APP[Lab-EdwardLabResults] JOB[0000237-160624170341227-oozie-mapr-W] ACTION[] URL[PUT http://phpvcoredev03:11000/oozie/v2/job/0000237-160624170341227-oozie-mapr-W?action=suspend] error[E0607], E0607: Other error in operation [suspend], java.io.EOFException
org.apache.oozie.servlet.XServletException: E0607: Other error in operation [suspend], java.io.EOFException
at org.apache.oozie.servlet.V1JobServlet.suspendWorkflowJob(V1JobServlet.java:430)
at org.apache.oozie.servlet.V1JobServlet.suspendJob(V1JobServlet.java:127)
at org.apache.oozie.servlet.BaseJobServlet.doPut(BaseJobServlet.java:92)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:646)
at org.apache.oozie.servlet.JsonRestServlet.service(JsonRestServlet.java:304)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:723)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at org.apache.oozie.servlet.AuthFilter$2.doFilter(AuthFilter.java:171)
at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:604)
at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:567)
at org.apache.oozie.servlet.AuthFilter.doFilter(AuthFilter.java:176)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at org.apache.oozie.servlet.HostnameFilter.doFilter(HostnameFilter.java:86)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)

Resolution:

Delete the job from the oozie database.
The entry must be deleted from both WF_JOBS and WF_ACTIONS table.

HBase replication stops abruptly

Issue:

HBase replication between source and DR cluster stops abruptly. Replication will not happen for existing tables, however replication will work as expected for newly created tables. All the necessary configuration for replication is specified correctly. The HBase services are up and running.

Root cause:

When a regionserver crashes, a different regionserver will try to take over the hlogs queue from the crashed regionserver to finish the replication activity. This will create a persistent zk node named "lock". This will help other regionservers to take over the replication queue again.

public boolean lockOtherRS(String znode) {
try {
String parent = ZKUtil.joinZNode(this.rsZNode, znode);
if (parent.equals(rsServerNameZnode)) {
LOG.warn("Won't lock because this is us, we're dead!");
return false;
}
String p = ZKUtil.joinZNode(parent, RS_LOCK_ZNODE);
ZKUtil.createAndWatch(this.zookeeper, p, Bytes.toBytes(rsServerNameZnode));
} catch (KeeperException e) {
...
return false;
}
return true;
}

If the 'hbase.zookeeper.useMulti' in hbase-site.xml is set to 'false', then if the regionserver crashes after creating the lock and before copying the replication queue of previously crashed server to its replication queue, the "lock" will not be deleted and no other regionserver can take over the replication queue.

Symptoms in HBase regionserver logs: (DR cluster)

2017-02-14 14:37:36,109 INFO [ReplicationExecutor-0] replication.ReplicationQueuesZKImpl: Won't transfer the queue, another RS took care of it because of: KeeperErrorCode = NodeExists for /hbase/replication/rs/xxxx.com,60020,1469048347755/lock

Resolution:

The solution is setting hbase.zookeeper.useMulti=true in hbase-site.xml.
Remove the /hbase/replication/rs from the DR cluster.

Installing Ganglia on MapR to monitor HBase

Node details:

Node1 – 10.10.YY.X1

Node2 - 10.10.YY.X2

Node3 - 10.10.YY.X3

The basic packages required for ganglia to run are:

[1] ganglia-gmond – required on all nodes from where metrics needs to be collected

[2] ganglia-gmetad – required on node which will perform aggregation (typically on one node)

[3] ganglia-web – required on the node running the Ganglia web UI

Following is my setup: (Concentrating on Ganglia and HBase)

Node1 – 10.10.YY.X1 – Hbase regionserver, gmond

Node2 - 10.10.YY.X3 – Hbase regionserver, gmond

Node3 - 10.10.YY.X2 – Hbase regionserver, Hbase master, gmond, gemetad, ganglia-web

For installing the packages in CentOS, use the following command:

yum install ganglia-gmond -y

yum install ganglia-gmetad -y

yum install ganglia-web -y

Edit ‘/etc/ganglia/gmond.conf’ on all nodes running the gmond service (sample is shown below)

Part 1:

cluster {

name = "ThreeNodeClusterAJames"

owner = "unspecified"

latlong = "unspecified"

url = "unspecified"

}

Part 2:

udp_send_channel {

bind_hostname = yes # Highly recommended, soon to be default.

# This option tells gmond to use a source address

# that resolves to the machine's hostname. Without

# this, the metrics may appear to come from any

# interface and the DNS names associated with

# those IPs will be used to create the RRDs.

#mcast_join = 239.2.11.71

host = 10.10.YY.XX

port = 8649

ttl = 1

}

NOTE:

· We are using unicast protocol instead of multicast protocol. Please comment ‘mcast_join = 239.2.11.71’ and add “host = <IP of localhost running the gmond service>”

· If there is any change in the port used, please change the same in ‘port’ property. Here I am using the default port which is 8649.

Part 3:

udp_recv_channel {

#mcast_join = 239.2.11.71

port = 8649

#bind = 239.2.11.71

#bind = 10.10.72.154

#retry_bind = true

# Size of the UDP buffer. If you are handling lots of metrics you really

# should bump it up to e.g. 10MB or even higher.

# buffer = 10485760

}

Edit ‘/etc/ganglia/gmetad.conf’ in gmetad server and add the following:

data_source "ThreeNodeClusterAJames" 10.10.YY.X1:8649 10.10.YY.X2:8649 10.10.YY.X3:8649

The cluster name ‘ThreeNodeClusterAJames’ should be same in all the gmond servers and gmetad server.

Edit ‘/opt/mapr/conf/hadoop-metrics.properties’ file and add

Part 1:

# Configuration of the "cldb" context for ganglia

cldb.class=com.mapr.fs.cldb.counters.MapRGangliaContext31

cldb.period=10

cldb.servers=10.10.YY.X1:8649,10.10.YY.X2:8649,10.10.YY.X3:8649

cldb.spoof=1

Part 2:

# Configuration of the "fileserver" context for ganglia

fileserver.class=com.mapr.fs.cldb.counters.MapRGangliaContext31

fileserver.period=37

fileserver.servers=10.10.YY.X1:8649,10.10.YY.X2:8649,10.10.YY.X3:8649

fileserver.spoof=1

Execute the following in any of the node and restart cldb service on all cldb nodes:

maprcli config save -values {"cldb.ganglia.cldb.metrics":"1"}

maprcli config save -values {"cldb.ganglia.fileserver.metrics":"1"}

Execute the following in all nodes

sudo setenforce 0

Execute the following gmetad server:

chown -R apache:apache /usr/share/ganglia

chown -R ganglia:ganglia /var/lib/ganglia/rrd*

chcon -R -t httpd_sys_content_t *

Edit ‘/etc/httpd/conf.d/ganglia.conf’ file:

Order deny,allow

# Deny from all

Allow from 127.0.0.1

Allow from ::1

# Allow from .example.com

</Location>

Restart the httpd, gmond and gmetad:

/etc/init.d/httpd restart

/etc/init.d/gmond restart

/etc/init.d/httpd restart

Edit ‘/opt/mapr/hbase/hbase-<version>/conf/hadoop-metrics2-hbase.properties’ file

hbase.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31

hbase.sink.ganglia.servers=<ganglia-server>:8649

hbase.sink.ganglia.period=10

hbase.class=org.apache.hadoop.metrics.ganglia.GangliaContext31

hbase.period=10

hbase.servers=<server-running-hbase-regionserver-1>:8649, <server-running-hbase-regionserver-2>:8649

# Configuration of the "jvm" context for ganglia

# Pick one: Ganglia 3.0 (former) or Ganglia 3.1 (latter)

# jvm.class=org.apache.hadoop.metrics.ganglia.GangliaContext

jvm.class=org.apache.hadoop.metrics.ganglia.GangliaContext31

jvm.period=10

jvm.servers=<server-running-hbase-regionserver-1>:8649, <server-running-hbase-regionserver-2>:8649

...

# Configuration of the "rpc" context for ganglia

# Pick one: Ganglia 3.0 (former) or Ganglia 3.1 (latter)

# rpc.class=org.apache.hadoop.metrics.ganglia.GangliaContext

rpc.class=org.apache.hadoop.metrics.ganglia.GangliaContext31

rpc.period=10

rpc.servers=<server-running-hbase-regionserver-1>:8649, <server-running-hbase-regionserver-2>:8649

...

# Configuration of the "rest" context for ganglia

# Pick one: Ganglia 3.0 (former) or Ganglia 3.1 (latter)

# rest.class=org.apache.hadoop.metrics.ganglia.GangliaContext

rest.class=org.apache.hadoop.metrics.ganglia.GangliaContext31

rest.period=10

rest.servers=<server-running-hbase-regionserver-1>:8649, <server-running-hbase-regionserver-2>:8649

Restart Hbase master and regionserver in all nodes.

You should be able to see the hbase metrics in ganglia web UI.

Ganglia UI listing the metrics collected:

Ganglia UI showing HBase metrics: