Bigdata In Our Palm

Accessing data from a pre-existing HBase table through Hive in MapR cluster

AIM

Convert a pre-existing table to Hive-Hbase table and access data making use of 'group by' operation.

Make the following changes in all the nodes in hive-site.xml:

<value>file:///opt/mapr/hive/hive-<version>/lib/hive-hbase-handler-<version>-mapr.jar,

file:///opt/mapr/hbase/hbase-<version>/lib/hbase-client-<version>-mapr.jar, file:///opt/mapr/hbase/hbase-

<version>/lib/hbase-server-<version>-mapr.jar,file:///opt/mapr/hbase/hbase-<version>/lib/hbase-protocol-<version>-

mapr.jar,file:///opt/mapr/zookeeper/zookeeper-<version>/zookeeper-<version>.jar</value>

<description>A comma separated list (with no spaces) of the jar files required for Hive-HBase integration</description>

</property>

<name>hbase.zookeeper.quorum</name>

<description>A comma separated list (with no spaces) of the IP addresses of all ZooKeeper servers in the

cluster.</description>

</property>

<name>hbase.zookeeper.property.clientPort</name>

<description>The Zookeeper client port. The MapR default clientPort is 5181.</description>

</property>

Once above steps are done, let's concentrate on the Hive-Hbase integration

HBase Side:

[1] Create Hbase table:

hbase(main):006:0> create 'myhbase','cfg1'

0 row(s) in 1.2330 seconds

=> Hbase::Table - myhbase

hbase(main):007:0> put 'myhbase','200','cfg1:val','right'

0 row(s) in 0.1260 seconds

[2] Insert data to HBase table

hbase(main):008:0> put 'myhbase','230','cfg1:val','left'

0 row(s) in 0.0100 seconds

hbase(main):009:0> scan 'myhbase'

ROW COLUMN+CELL

200 column=cfg1:val, timestamp=1489427368644, value=right

230 column=cfg1:val, timestamp=1489427381590, value=left

2 row(s) in 0.0480 seconds

hbase(main):010:0> put 'myhbase','2300','cfg1:val','left'

0 row(s) in 0.0240 seconds

hbase(main):011:0> scan 'myhbase'

ROW COLUMN+CELL

200 column=cfg1:val, timestamp=1489427368644, value=right

230 column=cfg1:val, timestamp=1489427381590, value=left

2300 column=cfg1:val, timestamp=1489427494185, value=left

3 row(s) in 0.0170 seconds

Hive Side:

[1] Create Hive-Hbase table:

hive> CREATE EXTERNAL TABLE hbase_table_4(key int, value string)

> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'

> WITH SERDEPROPERTIES ("hbase.columns.mapping" = "cfg1:val")

> TBLPROPERTIES("hbase.table.name" = "myhbase");

[2] Check for data in Hive table

hive> select * from hbase_table_4;

200 right

230 left

2300 left

Time taken: 0.172 seconds, Fetched: 3 row(s)

[3] Example with 'group by':

hive> select value,count(value) from hbase_table_4 group by value;

Query ID = mapr_20170313135543_574ef721-612f-4705-ab6c-1ea56f808f47

Total jobs = 1

Launching Job 1 out of 1

Number of reduce tasks not specified. Estimated from input data size: 1

In order to change the average load for a reducer (in bytes):

set hive.exec.reducers.bytes.per.reducer=<number>

In order to limit the maximum number of reducers:

set hive.exec.reducers.max=<number>

In order to set a constant number of reducers:

set mapreduce.job.reduces=<number>

Starting Job = job_1489190722556_0018, Tracking URL = http://vm51-154:8088/proxy/application_1489190722556_0018/

Kill Command = /opt/mapr/hadoop/hadoop-2.7.0/bin/hadoop job -kill job_1489190722556_0018

Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1

2017-03-13 13:55:55,743 Stage-1 map = 0%, reduce = 0%

2017-03-13 13:56:09,321 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.84 sec

2017-03-13 13:56:17,702 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 5.4 sec

MapReduce Total cumulative CPU time: 5 seconds 400 msec

Ended Job = job_1489190722556_0018

MapReduce Jobs Launched:

Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 5.4 sec MAPRFS Read: 0 MAPRFS Write: 0 SUCCESS

Total MapReduce CPU Time Spent: 5 seconds 400 msec

left 2

right 1

Time taken: 35.143 seconds, Fetched: 2 row(s)

Bigdata In Our Palm

Monday, 13 March 2017

Accessing data from a pre-existing HBase table through Hive in MapR cluster

AIM

Make the following changes in all the nodes in hive-site.xml:

Once above steps are done, let's concentrate on the Hive-Hbase integration

No comments:

Post a Comment

Blog Archive