Friday 22 September 2017

How to control time during which HBase major compaction will be executed?


Following two properties will help to control time at which major compaction will be kicked in.
(1) hbase.offpeak.start.hour
(2) hbase.offpeak.end.hour

The above properties take value between 0-23.
For example, if we identify that HBase cluster is not loaded during night, let's say 10 PM to 06 AM every day, then we can set the following in hbase-site.xml and restart the regionserver.

<property>
<name>hbase.offpeak.start.hour</name>
<value>22</value>
</property>

<property>
<name>hbase.offpeak.end.hour</name>
<value>6</value>
</property>

If the value is not correctly set, we will see similar WARN message in regionserver logs:
2017-09-22 19:21:16,533 WARN  [StoreOpener-ee1faec4bdc3df3a4f4fa959c641e782-1] compactions.OffPeakHours: Ignoring invalid start/end hour for peak hour : start = 22 end = -1. Valid numbers are [0-23]

We can also change the compaction ratio to have more finer control over the compaction. Following are the two properties that will help to achieve the same:
(1) hbase.hstore.compaction.ratio (Default value is 1.2)
(2) hbase.hstore.compaction.ratio.offpeak (Default value is 5.0)


If we want to change the values to 1.4 and 6.5, add the following in regionserver and restart the service.
<property>
<name>hbase.hstore.compaction.ratio</name>
<value>1.4</value>
</property>

<property>
<name>hbase.hstore.compaction.ratio.offpeak</name>
<value>6.5</value>
</property>

You will see similar message in regionserver logs once the values are imposed:
2017-09-22 19:36:16,555 INFO  [StoreOpener-9edcc4b5cb0376b7366544d00b42ba44-1] compactions.CompactionConfiguration: size [134217728, 9223372036854775807); files [3, 10); ratio 1.400000; off-peak ratio 6.500000; throttle point 2684354560; major period 604800000, major jitter 0.500000, min locality to compact 0.000000

From official HBase documentation:


hbase.hstore.compaction.ratio: For minor compaction, this ratio is used to determine whether a given StoreFile which is larger than hbase.hstore.compaction.min.size is eligible for compaction. Its effect is to limit compaction of large StoreFiles. The value of hbase.hstore.compaction.ratio is expressed as a floating-point decimal. A large ratio, such as 10, will produce a single giant StoreFile. Conversely, a low value, such as .25, will produce behavior similar to the BigTable compaction algorithm, producing four StoreFiles. A moderate value of between 1.0 and 1.4 is recommended. When tuning this value, you are balancing write costs with read costs. Raising the value (to something like 1.4) will have more write costs, because you will compact larger StoreFiles. However, during reads, HBase will need to seek through fewer StoreFiles to accomplish the read. Consider this approach if you cannot take advantage of Bloom filters. Otherwise, you can lower this value to something like 1.0 to reduce the background cost of writes, and use Bloom filters to control the number of StoreFiles touched during reads. For most cases, the default value is appropriate.

hbase.hstore.compaction.ratio.offpeak: Allows you to set a different (by default, more aggressive) ratio for determining whether larger StoreFiles are included in compactions during off-peak hours. Works in the same way as hbase.hstore.compaction.ratio. Only applies if hbase.offpeak.start.hour and hbase.offpeak.end.hour are also enabled.

1 comment:

  1. Really nice blog post.provided a helpful information.I hope that you will post more updates like thisHadoop Admin Online training INDIA

    ReplyDelete