Friday, 30 March 2018

How to control the number of DRILL minor fragments for the scan on MapRDB JSON table?


Aim:

This post discusses about the DRILL property that will help to control the number of minor fragments for a scan from MapRDB JSON table.

Details:

From 1.11 version of DRILL, it supports secondary indexing on MapRDB JSON table. This is a very cool feature which can increase the turn around for your query is reduced considerably. 

The team also introduced a new logic where the number of minor fragment spawn for DRILL MapRDB JSON is based on the size of the data read. This is controlled by 'format-maprdb.json.scanSizeMB ' property.

The default value of  'format-maprdb.json.scanSizeMB' is 128MB. This means that if you have a MapRDB JSON table of size 200MB, while querying the table using DRILL, it will spin up 2 minor fragments to perform the table scan operation.

The current value of this property can be checked using the query:
select * from sys.boot where name='format-maprdb.json.scanSizeMB';

0: jdbc:drill:> select * from sys.boot where name='format-maprdb.json.scanSizeMB';
+--------------------------------+-------+-------------------+--------------+---------+----------+-------------+-----------+------------+
|              name              | kind  | accessibleScopes  | optionScope  | status  | num_val  | string_val  | bool_val  | float_val  |
+--------------------------------+-------+-------------------+--------------+---------+----------+-------------+-----------+------------+
| format-maprdb.json.scanSizeMB  | LONG  | BOOT              | BOOT         | BOOT    | 128      | null        | null      | null       |
+--------------------------------+-------+-------------------+--------------+---------+----------+-------------+-----------+------------+

As you noticed, this is a BOOT property. Hence, you will not be able to change it at a session level. To change the property modify  '/opt/mapr/drill/drill-<version>/conf/drill-override.conf' file to add the following:

  format-maprdb: { json.scanSizeMB : <size in MB> }

Restart the drill-bits process. Verify the property is taken effect.
In the following example, the value is changed to 256MB.

1: jdbc:drill:> select * from sys.boot where name='format-maprdb.json.scanSizeMB';
+--------------------------------+-------+-------------------+--------------+---------+----------+-------------+-----------+------------+
|              name              | kind  | accessibleScopes  | optionScope  | status  | num_val  | string_val  | bool_val  | float_val  |
+--------------------------------+-------+-------------------+--------------+---------+----------+-------------+-----------+------------+
| format-maprdb.json.scanSizeMB  | LONG  | BOOT              | BOOT         | BOOT    | 256      | null        | null      | null       |
+--------------------------------+-------+-------------------+--------------+---------+----------+-------------+-----------+------------+

No comments:

Post a Comment