Following is a study of parameters that control minor and major compaction in HBase.
The simple formula for selection of a file for minor compaction is :
selects a file for compaction when the file size <= sum(smaller_files_size) * hbase.hstore.compaction.ratio.
Example from HBase official documentation:
The following StoreFiles exist: 100, 50, 23, 12, and 12 bytes apiece (oldest to newest). With the above parameters, the files that would be selected for minor compaction are 23, 12, and 12.
Why?
Remember the logic
selects a file for compaction when the file size <= sum(smaller_files_size) * hbase.hstore.compaction.ratio.
100 --> No, because sum(50, 23, 12, 12) * 1.0 = 97.
50 --> No, because sum(23, 12, 12) * 1.0 = 47.
23 --> Yes, because sum(12, 12) * 1.0 = 24.
12 --> Yes, because the previous file has been included, and because this does not exceed the the max-file limit of 5
12 --> Yes, because the previous file had been included, and because this does not exceed the the max-file limit of 5.
Following log snippet shows HBase regionserver logs during minor compaction or shortcompactions:
It provide following information:
[1] Shows which table, which column family, which region is undergoing compaction.
[2] Number of files compacted.
[3] Total size of file for compaction. Sum of individual files undergoing compaction.
[4] Shows total file size after compaction is completed.
[5] Time taken for minor compaction.
Minor compaction logs: (Table name is 'hb' with column family 'c')
2017-07-10 17:08:13,967 INFO [regionserver/vm52/10.10.XX.XX:16020-shortCompactions-1499720893966] regionserver.HRegion: Starting compaction on c in region hb,,1499720284228.0f0486e029334542705e66f401fa698b.
2017-07-10 17:08:13,968 INFO [regionserver/vm52/10.10.XX.XX:16020-shortCompactions-1499720893966] regionserver.HStore: Starting compaction of 3 file(s) in c of hb,,1499720284228.0f0486e029334542705e66f401fa698b. into tmpdir=maprfs:/hbase/data/default/hb/0f0486e029334542705e66f401fa698b/.tmp, totalSize=14.7 K
2017-07-10 17:08:13,980 INFO [regionserver/vm52/10.10.XX.XX:16020-shortCompactions-1499720893966] hfile.CacheConfig: blockCache=LruBlockCache{blockCount=2, currentSize=1291688, freeSize=1249607128, maxSize=1250898816, heapSize=1291688, minSize=1188353920, minFactor=0.95, multiSize=594176960, multiFactor=0.5, singleSize=297088480, singleFactor=0.25}, cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, prefetchOnOpen=false
2017-07-10 17:08:14,109 INFO [regionserver/vm52/10.10.XX.XX:16020-shortCompactions-1499720893966] regionserver.HStore: Completed compaction of 3 (all) file(s) in c of hb,,1499720284228.0f0486e029334542705e66f401fa698b. into 4cc6a50eb38d4ef2844a3339bcdfe11d(size=5.0 K), total size for store is 5.0 K. This selection was in queue for 0sec, and took 0sec to execute.
2017-07-10 17:08:14,113 INFO [regionserver/vm52/10.10.XX.XX:16020-shortCompactions-1499720893966] regionserver.CompactSplitThread: Completed compaction: Request = regionName=hb,,1499720284228.0f0486e029334542705e66f401fa698b., storeName=c, fileCount=3, fileSize=14.7 K, priority=1, time=4222857501044128; duration=0sec
Major compaction depend on the following parameters:
[1] hbase.hregion.majorcompaction - Default value is 604800000 (7 days)
The time interval between each major compaction. Setting this to 0 will disable time based major compaction.
Sometimes, minor compactions can be promoted to major compaction.
[2] Off peak hour compactions:
Identifying peak hour of your cluster will help in notifying HBase not to do heavy minor compactions during the busy hours.
For this from HBase 1.2+ onward, there are following parameters:
[a] hbase.hstore.compaction.max.size.offpeak – sets a value for the largest file that can be used for compaction
[b] hbase.offpeak.start.hour= 0..23 (specify start hour)
[c] hbase.offpeak.end.hour= 0..23 (specify end hour)
The hstore compaction ratio is by default 1.2 for peak hours. For offpeak hours, it is 5.
Both the values can be adjusted using the following parameters:
[a] hbase.hstore.compaction.ratio
[b] hbase.hstore.compaction.ratio.offpeak
[3] hbase.hregion.majorcompaction.jitter
Compactions are carried out by regionservers. Inorder to make sure that all regionserver does not do major compaction at the same time, we have this jitter parameter.
By default the value is 0.5. 0.5 is the maximum value of outer bound. hbase.hregion.majorcompaction is multiplied by this some fraction that will be inside this jitter value and then added/subtracted to determine when to run the next major compaction.
Following log snippet shows HBase regionserver logs during major compaction or largecompactions:
It provide following information:
[1] Displays the table and region undergoing major compaction.
[2] If the major compaction is triggered manually, then minor compaction is called internally.
[3] Intermittently the store file will be stored inside the .tmp folder.
[4]Provides information about the number of files under compaction, total size of new file generated, time taken for the compaction.
Major compaction logs: (Table name is 'hb' with column family 'c')
2017-07-10 21:10:55,158 INFO [PriorityRpcServer.handler=1,queue=1,port=16020] regionserver.RSRpcServices: Compacting hb,,1499720284228.0f0486e029334542705e66f401fa698b.
2017-07-10 21:10:55,159 DEBUG [PriorityRpcServer.handler=1,queue=1,port=16020] compactions.RatioBasedCompactionPolicy: Selecting compaction from 2 store files, 0 compacting, 2 eligible, 10 blocking
2017-07-10 21:10:55,159 DEBUG [PriorityRpcServer.handler=1,queue=1,port=16020] regionserver.HStore: 0f0486e029334542705e66f401fa698b - c: Initiating major compaction (all files)
2017-07-10 21:10:55,159 DEBUG [PriorityRpcServer.handler=1,queue=1,port=16020] regionserver.CompactSplitThread: Small Compaction requested: org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext@1ffa895a; Because: User-triggered major compaction; compaction_queue=(0:1), split_queue=0, merge_queue= 0
2017-07-10 21:10:55,159 INFO [regionserver/vm52/10.10.XX.XX:16020-shortCompactions-1499725825449] regionserver.HRegion: Starting compaction on c in region hb,,1499720284228.0f0486e029334542705e66f401fa698b.
2017-07-10 21:10:55,160 INFO [regionserver/vm52/10.10.XX.XX:16020-shortCompactions-1499725825449] regionserver.HStore: Starting compaction of 2 file(s) in c of hb,,1499720284228.0f0486e029334542705e66f401fa698b. into tmpdir=maprfs:/hbase/data/default/hb/0f0486e029334542705e66f401fa698b/.tmp, totalSize=10.9 K
2017-07-10 21:10:55,162 DEBUG [regionserver/vm52/10.10.XX.XX:16020-shortCompactions-1499725825449] compactions.Compactor: Compacting maprfs:/hbase/data/default/hb/0f0486e029334542705e66f401fa698b/c/0daeafd75a4c4ba4a53172a73b9ca4b0, keycount=41, bloomtype=ROW, size=6.0 K, encoding=NONE, seqNum=174, earliestPutTs=1499720300277
2017-07-10 21:10:55,164 DEBUG [regionserver/vm52/10.10.XX.XX:16020-shortCompactions-1499725825449] compactions.Compactor: Compacting maprfs:/hbase/data/default/hb/0f0486e029334542705e66f401fa698b/c/bfeea9e43bc347ec84863bdd4476e270, keycount=2, bloomtype=ROW, size=4.9 K, encoding=NONE, seqNum=182, earliestPutTs=1499735424849
2017-07-10 21:10:55,165 INFO [regionserver/vm52/10.10.XX.XX:16020-shortCompactions-1499725825449] hfile.CacheConfig: blockCache=LruBlockCache{blockCount=3, currentSize=1308744, freeSize=1249590072, maxSize=1250898816, heapSize=1308744, minSize=1188353920, minFactor=0.95, multiSize=594176960, multiFactor=0.5, singleSize=297088480, singleFactor=0.25}, cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, prefetchOnOpen=false
2017-07-10 21:10:55,191 DEBUG [regionserver/vm52/10.10.XX.XX:16020-shortCompactions-1499725825449] regionserver.HRegionFileSystem: Committing store file maprfs:/hbase/data/default/hb/0f0486e029334542705e66f401fa698b/.tmp/da9d2b5454e640769a9b20c82124a010 as maprfs:/hbase/data/default/hb/0f0486e029334542705e66f401fa698b /c/da9d2b5454e640769a9b20c82124a010
2017-07-10 21:10:55,208 DEBUG [regionserver/vm52/10.10.XX.XX:16020-shortCompactions-1499725825449] regionserver.HStore: Removing store files after compaction...
2017-07-10 21:10:55,219 DEBUG [regionserver/vm52/10.10.XX.XX:16020-shortCompactions-1499725825449] backup.HFileArchiver: Archiving compacted store files.
2017-07-10 21:10:55,231 DEBUG [regionserver/vm52/10.10.XX.XX:16020-shortCompactions-1499725825449] backup.HFileArchiver: Finished archiving from class org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile, file:maprfs:/hbase/data/default/hb/0f0486e029334542705e66f401fa698b/c/0daeafd75a4c4ba4a53172a73b9ca4b0, to maprfs:/hbase/archive/data/default/hb/0f0486e029334542705e66f401fa698b/c/0daeafd75a4c4ba4a53172a73b9ca4b0
2017-07-10 21:10:55,243 DEBUG [regionserver/vm52/10.10.XX.XX:16020-shortCompactions-1499725825449] backup.HFileArchiver: Finished archiving from class org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile, file:maprfs:/hbase/data/default/hb/0f0486e029334542705e66f401fa698b/c/bfeea9e43bc347ec84863bdd4476e270, to maprfs:/hbase/archive/data/default/hb/0f0486e029334542705e66f401fa698b/c/bfeea9e43bc347ec84863bdd4476e270
2017-07-10 21:10:55,244 INFO [regionserver/vm52/10.10.XX.XX:16020-shortCompactions-1499725825449] regionserver.HStore: Completed major compaction of 2 (all) file(s) in c of hb,,1499720284228.0f0486e029334542705e66f401fa698b. into da9d2b5454e640769a9b20c82124a010(size=6.1 K), total size for store is 6.1 K. This selection was in queue for 0sec, and took 0sec to execute.
2017-07-10 21:10:55,246 INFO [regionserver/vm52/10.10.XX.XX:16020-shortCompactions-1499725825449] regionserver.CompactSplitThread: Completed compaction: Request = regionName=hb,,1499720284228.0f0486e029334542705e66f401fa698b., storeName=c, fileCount=2, fileSize=10.9 K, priority=1, time=4237418695815609; duration=0sec
2017-07-10 21:10:55,246 DEBUG [regionserver/vm52/10.10.XX.XX:16020-shortCompactions-1499725825449] regionserver.CompactSplitThread: CompactSplitThread Status : compaction_queue=(0:0), split_queue=0, merge_queue=0