How to control the number of DRILL minor fragments for the scan on MapRDB JSON table?
Aim:
This post discusses about the DRILL property that will help to control the number of minor fragments for a scan from MapRDB JSON table.
Details:
From 1.11 version of DRILL, it supports secondary indexing on MapRDB JSON table. This is a very cool feature which can increase the turn around for your query is reduced considerably.
The team also introduced a new logic where the number of minor fragment spawn for DRILL MapRDB JSON is based on the size of the data read. This is controlled by 'format-maprdb.json.scanSizeMB ' property.
The default value of 'format-maprdb.json.scanSizeMB' is 128MB. This means that if you have a MapRDB JSON table of size 200MB, while querying the table using DRILL, it will spin up 2 minor fragments to perform the table scan operation.
The current value of this property can be checked using the query:
select * from sys.boot where name='format-maprdb.json.scanSizeMB';
0: jdbc:drill:> select * from sys.boot where name='format-maprdb.json.scanSizeMB'; +--------------------------------+-------+-------------------+--------------+---------+----------+-------------+-----------+------------+ | name | kind | accessibleScopes | optionScope | status | num_val | string_val | bool_val | float_val | +--------------------------------+-------+-------------------+--------------+---------+----------+-------------+-----------+------------+ | format-maprdb.json.scanSizeMB | LONG | BOOT | BOOT | BOOT | 128 | null | null | null | +--------------------------------+-------+-------------------+--------------+---------+----------+-------------+-----------+------------+
As you noticed, this is a BOOT property. Hence, you will not be able to change it at a session level. To change the property modify '/opt/mapr/drill/drill-<version>/conf/drill-override.conf' file to add the following:
format-maprdb: { json.scanSizeMB : <size in MB> }
Restart the drill-bits process. Verify the property is taken effect.
In the following example, the value is changed to 256MB.
1: jdbc:drill:> select * from sys.boot where name='format-maprdb.json.scanSizeMB'; +--------------------------------+-------+-------------------+--------------+---------+----------+-------------+-----------+------------+ | name | kind | accessibleScopes | optionScope | status | num_val | string_val | bool_val | float_val | +--------------------------------+-------+-------------------+--------------+---------+----------+-------------+-----------+------------+ | format-maprdb.json.scanSizeMB | LONG | BOOT | BOOT | BOOT | 256 | null | null | null | +--------------------------------+-------+-------------------+--------------+---------+----------+-------------+-----------+------------+
No comments:
Post a Comment