How to control time during which HBase major compaction will be executed?
Following two
properties will help to control time at which major compaction will be kicked
The above properties
take value between 0-23.
For example, if we
identify that HBase cluster is not loaded during night, let's say 10 PM to 06
AM every day, then we can set the following in hbase-site.xml and restart the
If the value is not
correctly set, we will see similar WARN message in regionserver logs:
19:21:16,533 WARN
compactions.OffPeakHours: Ignoring invalid start/end hour for peak hour : start
= 22 end = -1. Valid numbers are [0-23]
We can also change
the compaction ratio to have more finer control over the compaction. Following
are the two properties that will help to achieve the same:
hbase.hstore.compaction.ratio (Default value is 1.2)
hbase.hstore.compaction.ratio.offpeak (Default value is 5.0)
If we want to change
the values to 1.4 and 6.5, add the following in regionserver and restart the
You will see similar
message in regionserver logs once the values are imposed:
19:36:16,555 INFO
compactions.CompactionConfiguration: size [134217728, 9223372036854775807);
files [3, 10); ratio 1.400000; off-peak ratio 6.500000; throttle point
2684354560; major period 604800000, major jitter 0.500000, min locality to
compact 0.000000
From official HBase documentation:
For minor compaction, this ratio is used to determine whether a given StoreFile
which is larger than hbase.hstore.compaction.min.size is eligible for
compaction. Its effect is to limit compaction of large StoreFiles. The value of
hbase.hstore.compaction.ratio is expressed as a floating-point decimal. A large
ratio, such as 10, will produce a single giant StoreFile. Conversely, a low
value, such as .25, will produce behavior similar to the BigTable compaction
algorithm, producing four StoreFiles. A moderate value of between 1.0 and 1.4
is recommended. When tuning this value, you are balancing write costs with read
costs. Raising the value (to something like 1.4) will have more write costs,
because you will compact larger StoreFiles. However, during reads, HBase will
need to seek through fewer StoreFiles to accomplish the read. Consider this
approach if you cannot take advantage of Bloom filters. Otherwise, you can
lower this value to something like 1.0 to reduce the background cost of writes,
and use Bloom filters to control the number of StoreFiles touched during reads.
For most cases, the default value is appropriate.
Allows you to set a different (by default, more aggressive) ratio for
determining whether larger StoreFiles are included in compactions during
off-peak hours. Works in the same way as hbase.hstore.compaction.ratio. Only
applies if hbase.offpeak.start.hour and hbase.offpeak.end.hour are also