Friday, 22 September 2017

How to control time during which HBase major compaction will be executed?


Following two properties will help to control time at which major compaction will be kicked in.
(1) hbase.offpeak.start.hour
(2) hbase.offpeak.end.hour

The above properties take value between 0-23.
For example, if we identify that HBase cluster is not loaded during night, let's say 10 PM to 06 AM every day, then we can set the following in hbase-site.xml and restart the regionserver.

<property>
<name>hbase.offpeak.start.hour</name>
<value>22</value>
</property>

<property>
<name>hbase.offpeak.end.hour</name>
<value>6</value>
</property>

If the value is not correctly set, we will see similar WARN message in regionserver logs:
2017-09-22 19:21:16,533 WARN  [StoreOpener-ee1faec4bdc3df3a4f4fa959c641e782-1] compactions.OffPeakHours: Ignoring invalid start/end hour for peak hour : start = 22 end = -1. Valid numbers are [0-23]

We can also change the compaction ratio to have more finer control over the compaction. Following are the two properties that will help to achieve the same:
(1) hbase.hstore.compaction.ratio (Default value is 1.2)
(2) hbase.hstore.compaction.ratio.offpeak (Default value is 5.0)


If we want to change the values to 1.4 and 6.5, add the following in regionserver and restart the service.
<property>
<name>hbase.hstore.compaction.ratio</name>
<value>1.4</value>
</property>

<property>
<name>hbase.hstore.compaction.ratio.offpeak</name>
<value>6.5</value>
</property>

You will see similar message in regionserver logs once the values are imposed:
2017-09-22 19:36:16,555 INFO  [StoreOpener-9edcc4b5cb0376b7366544d00b42ba44-1] compactions.CompactionConfiguration: size [134217728, 9223372036854775807); files [3, 10); ratio 1.400000; off-peak ratio 6.500000; throttle point 2684354560; major period 604800000, major jitter 0.500000, min locality to compact 0.000000

From official HBase documentation:


hbase.hstore.compaction.ratio: For minor compaction, this ratio is used to determine whether a given StoreFile which is larger than hbase.hstore.compaction.min.size is eligible for compaction. Its effect is to limit compaction of large StoreFiles. The value of hbase.hstore.compaction.ratio is expressed as a floating-point decimal. A large ratio, such as 10, will produce a single giant StoreFile. Conversely, a low value, such as .25, will produce behavior similar to the BigTable compaction algorithm, producing four StoreFiles. A moderate value of between 1.0 and 1.4 is recommended. When tuning this value, you are balancing write costs with read costs. Raising the value (to something like 1.4) will have more write costs, because you will compact larger StoreFiles. However, during reads, HBase will need to seek through fewer StoreFiles to accomplish the read. Consider this approach if you cannot take advantage of Bloom filters. Otherwise, you can lower this value to something like 1.0 to reduce the background cost of writes, and use Bloom filters to control the number of StoreFiles touched during reads. For most cases, the default value is appropriate.

hbase.hstore.compaction.ratio.offpeak: Allows you to set a different (by default, more aggressive) ratio for determining whether larger StoreFiles are included in compactions during off-peak hours. Works in the same way as hbase.hstore.compaction.ratio. Only applies if hbase.offpeak.start.hour and hbase.offpeak.end.hour are also enabled.

Wednesday, 20 September 2017

HBase Memstore Flush - Part 2


Aim:


Aim of this blog is to discuss various scenarios which will lead to memstore flushes in HBase.

Any put operation to HBase goes to memstore (in memory). It is also written to WAL by default. There is one memstore per column family per region per regionserver per HBase table. When certain threshold is reached memstore is flushed.

The threshold can be mainly categorized into two:
[A] Size based
[B] Time based

This blog focuses on time based memstore flushes. My previous blog (HBase Memstore Flush - Part 1) discusses about size based memstore flushes.

Time based memstore flushes:


Memstore is also flushed periodically. The flushing interval in time based memstore flush is controlled by 'hbase.regionserver.optionalcacheflushinterval' set in hbase-site.xml. If nothing is set, the default value - 3600000ms (1 hour) is taken. Periodic memstore flushes will help in freeing up regionserver memory. However, more number of small memstore flushes, the more number of minor compaction. Hence depending on the application running on HBase, we need to tune the parameter. Setting 'hbase.regionserver.optionalcacheflushinterval' to negative value will disable periodic memstore flushes.

Periodic memstore flushes are introduced as part of https://issues.apache.org/jira/browse/HBASE-5930 

Following is part of HBase source code that performs the same:

static class PeriodicMemstoreFlusher extends ScheduledChore {
    final HRegionServer server;
    final static int RANGE_OF_DELAY = 20000; //millisec
    final static int MIN_DELAY_TIME = 3000; //millisec
    public PeriodicMemstoreFlusher(int cacheFlushInterval, final HRegionServer server) {
      super(server.getServerName() + "-MemstoreFlusherChore", server, cacheFlushInterval);
      this.server = server;
    }

    @Override
    protected void chore() {
      for (Region r : this.server.onlineRegions.values()) {
        if (r == null)
          continue;
        if (((HRegion)r).shouldFlush()) {
          FlushRequester requester = server.getFlushRequester();
          if (requester != null) {
            long randomDelay = RandomUtils.nextInt(RANGE_OF_DELAY) + MIN_DELAY_TIME;
            LOG.info(getName() + " requesting flush for region " +
              r.getRegionInfo().getRegionNameAsString() + " after a delay of " + randomDelay);
            //Throttle the flushes by putting a delay. If we don't throttle, and there
            //is a balanced write-load on the regions in a table, we might end up
            //overwhelming the filesystem with too many flushes at once.
            requester.requestDelayedFlush(r, randomDelay, false);
          }
        }
      }
    }
  }

Following is the definition for shouldFlush():

boolean shouldFlush() {
    // This is a rough measure.
    if (this.maxFlushedSeqId > 0
          && (this.maxFlushedSeqId + this.flushPerChanges < this.sequenceId.get())) {
      return true;
    }
    long modifiedFlushCheckInterval = flushCheckInterval;
    if (getRegionInfo().isMetaRegion() &&
        getRegionInfo().getReplicaId() == HRegionInfo.DEFAULT_REPLICA_ID) {
      modifiedFlushCheckInterval = META_CACHE_FLUSH_INTERVAL;
    }
    if (modifiedFlushCheckInterval <= 0) { //disabled
      return false;
    }
    long now = EnvironmentEdgeManager.currentTime();
    //if we flushed in the recent past, we don't need to do again now
    if ((now - getEarliestFlushTimeForAllStores() < modifiedFlushCheckInterval)) {
      return false;
    }
    //since we didn't flush in the recent past, flush now if certain conditions
    //are met. Return true on first such memstore hit.
    for (Store s : getStores()) {
      if (s.timeOfOldestEdit() < now - modifiedFlushCheckInterval) {
        // we have an old enough edit in the memstore, flush
        return true;
      }
    }
    return false;
  }

'flushCheckInterval' is set from following properties:

this.flushCheckInterval = conf.getInt(MEMSTORE_PERIODIC_FLUSH_INTERVAL,DEFAULT_CACHE_FLUSH_INTERVAL);

where 

public static final String MEMSTORE_PERIODIC_FLUSH_INTERVAL = "hbase.regionserver.optionalcacheflushinterval";
public static final int DEFAULT_CACHE_FLUSH_INTERVAL = 3600000;

The periodic flush chore will be invoked based on 'hbase.server.thread.wakefrequency' value. Default value is 10000ms.

HBase Memstore Flush - Part 1


Aim:


Aim of this blog is to discuss various scenarios which will lead to memstore flushes in HBase.

Any put operation to HBase goes to memstore (in memory). It is also written to WAL by default. There is one memstore per column family per region per regionserver per HBase table. When certain threshold is reached memstore is flushed.

The threshold can be mainly categorized into two:
[A] Size based
[B] Time based

This blog focuses on size based memstore flushes. My next blog (HBase Memstore Flush - Part 2) discusses about time based memstore flushes.

Size based memstore flushes:


Since memstore is in memory and is part of regionserver memory, it is flushed when it reaches a certain threshold.

The threshold is controlled by following parameters:


[a] hbase.hregion.memstore.flush.size (specified in bytes)

Each memstore is checked for this threshold periodically (determined by 'hbase.server.thread.wakefrequency'). If the memstore hits this limit, it will be flushed. Please note that every memstore flush creates one HFile per CF per region.

[b] Regionserver might have many regions managed by it. Since memstore uses heap memory of regionserver, we also need to control the total heap memory used by all the memstores. This is controlled by following parameters:

(1) hbase.regionserver.global.memstore.size.lower.limit 
Maximum size of all memstores in a region server before flushes are forced. Defaults to 95% of hbase.regionserver.global.memstore.size (0.95). hbase.regionserver.global.memstore.lowerLimit is old property for the same. It will be honored if specified.

(2) hbase.regionserver.global.memstore.size
Maximum size of all memstores in a region server before new updates are blocked and flushes are forced. Defaults to 40% of heap (0.4). Updates are blocked and flushes are forced until size of all memstores in a region server hits hbase.regionserver.global.memstore.size.lower.limit. hbase.regionserver.global.memstore.upperLimit is the old property for the same. It will be honored if specified.


Few other parameters that needs to be taken care are the following:
* hbase.hregion.memstore.block.multiplier - Updates are blocked if memstore has hbase.hregion.memstore.block.multiplier times hbase.hregion.memstore.flush.size bytes. The default value is 4.

* hbase.hregion.percolumnfamilyflush.size.lower.bound - If FlushLargeStoresPolicy is used, then every time that we hit the total memstore limit, we find out all the column families whose memstores exceed this value, and only flush them, while retaining the others whose memstores are lower than this limit. If none of the families have their memstore size more than this, all the memstores will be flushed (just as usual). This value should be less than half of the total memstore threshold (hbase.hregion.memstore.flush.size). (https://issues.apache.org/jira/browse/HBASE-10201). To restore the old behavior of flushes writing out all column families, set hbase.regionserver.flush.policy to org.apache.hadoop.hbase.regionserver.FlushAllStoresPolicy either in hbase-default.xml or on a per-table basis by setting the policy to use with HTableDescriptor.getFlushPolicyClassName().

Friday, 15 September 2017

How to enable/disable Hue autocomplete feature in editors

Aim: 

Hue 3.12 comes with rich features. One such feature is option to enable or disable autocomplete in editors and notebooks. Autocomplete feature is turned ON by default.

How to?

There are few options that will help to achieve this.

[1] If we need to use the new editor which has the autocomplete feature available, then make sure following property is set to true in hue.ini file.  (requires hue restart)

use_new_editor=true (if this property is not explicitly set, it defaults to 'true')

If the above property is set to 'false', then autocomplete feature will not be available.

[2] set editor_autocomplete_timeout=0 in hue.ini file to disable autocomplete feature.  (requires hue restart)

[3] After loading the Hue editor, press "Ctrl+,". You will see options to 'Enable Autocompleter' and 'Enable Live Autocompletion'. Marking it  as unchecked will disable the autocomplete feature.  

Tuesday, 5 September 2017

Hive log files not getting deleted even after retention number is reached from Hive 2.1 onward


Issue:

Hive log files not getting deleted even after the retention number is reached. Log rotation works fine.
Issue is observed from Hive 2.1 which uses log4j2 for logging.

Cause:

The issue is observed for 'TimeBasedTriggeringPolicy' in log4j2.
This is a know limitation in 'TimeBasedTriggeringPolicy' for log4j2 as mentioned in https://issues.apache.org/jira/browse/LOG4J2-435.

Workaround:

One workaround for the issue is to use SizeBasedTriggeringPolicy. 
To use 'SizeBasedTriggeringPolicy', make the following changes in 'hive-log4j2.properties' inside respective HIVE_CONF_DIR.

Comment out following:
appender.DRFA.filePattern = ${sys:hive.log.dir}/${sys:hive.log.file}.%d{yyyy-MM-dd}
appender.DRFA.policies.time.type = TimeBasedTriggeringPolicy
appender.DRFA.policies.time.interval = 1
appender.DRFA.policies.time.modulate = true

Add the following:
appender.DRFA.filePattern = ${sys:hive.log.dir}/${sys:hive.log.file}.%i
appender.DRFA.policies.size.type=SizeBasedTriggeringPolicy
appender.DRFA.policies.size.size=100MB    -----> Customize the size of each log file you need
appender.DRFA.strategy.max = 3   -----> Customize the number of log files to be retained

It will look similar to this after above activity:

It will look similar to this after above activity:
# daily rolling file appender
appender.DRFA.type = RollingRandomAccessFile
appender.DRFA.name = DRFA
appender.DRFA.fileName = ${sys:hive.log.dir}/${sys:hive.log.file}
# Use %pid in the filePattern to append <process-id>@<host-name> to the filename if you want separate log files for different CLI session
#appender.DRFA.filePattern = ${sys:hive.log.dir}/${sys:hive.log.file}.%d{yyyy-MM-dd}
appender.DRFA.filePattern = ${sys:hive.log.dir}/${sys:hive.log.file}.%i
appender.DRFA.layout.type = PatternLayout
appender.DRFA.layout.pattern = %d{ISO8601} %5p [%t] %c{2}: %m%n
appender.DRFA.policies.type = Policies
#appender.DRFA.policies.time.type = TimeBasedTriggeringPolicy
appender.DRFA.policies.size.type=SizeBasedTriggeringPolicy
appender.DRFA.policies.size.size=100MB
#appender.DRFA.policies.time.interval = 1
#appender.DRFA.policies.time.modulate = true
appender.DRFA.strategy.type = DefaultRolloverStrategy
appender.DRFA.strategy.max = 3