Wednesday, 10 May 2017


SQOOP Import failure to Hive parquet table



Sample query that fails:


sqoop import --connect <connection_string> --query "select * from alwin_test where \$CONDITIONS"  --create-hive-table --hive-import --hive-table <hive_table_name> --username <uname> -P --target-dir '/user/mapr/oracleimport' -m 1 --as-parquetfile

However the following will succeed:

sqoop import --connect <connection_string> --query "select * from alwin_test where \$CONDITIONS"  --create-hive-table --hive-import --hive-table <hive_table_name> --username <uname> -P --target-dir '/user/mapr/oracleimport' -m 1

Error thrown:


ERROR sqoop.Sqoop: Got exception running Sqoop: org.apache.avro.SchemaParseException: org.codehaus.jackson.JsonParseException: Unexpected end-out: was expecting closing quote for a string value 
at [Source: java.io.StringReader@13ae2297; line: 1, column: 6001] 
org.apache.avro.SchemaParseException: org.codehaus.jackson.JsonParseException: Unexpected end-of-input: was expecting closing quote for a string value 
at [Source: java.io.StringReader@13ae2297; line: 1, column: 6001] 
at org.apache.avro.Schema$Parser.parse(Schema.java:955) 
at org.apache.avro.Schema$Parser.parse(Schema.java:943) 
at org.kitesdk.data.DatasetDescriptor$Builder.schemaLiteral(DatasetDescriptor.java:463) 
at org.kitesdk.data.spi.hive.HiveUtils.descriptorForTable(HiveUtils.java:157) 
at org.kitesdk.data.spi.hive.HiveAbstractMetadataProvider.load(HiveAbstractMetadataProvider.java:104) 
at org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository.load(FileSystemDatasetRepository.java:197) 
at org.kitesdk.data.spi.hive.HiveManagedDatasetRepository.create(HiveManagedDatasetRepository.java:82) 
at org.kitesdk.data.Datasets.create(Datasets.java:239) 
at org.kitesdk.data.Datasets.create(Datasets.java:307) 
at org.apache.sqoop.mapreduce.ParquetJob.createDataset(ParquetJob.java:107) 
at org.apache.sqoop.mapreduce.ParquetJob.configureImportJob(ParquetJob.java:89) 
at org.apache.sqoop.mapreduce.DataDrivenImportJob.configureMapper(DataDrivenImportJob.java:108) 
at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:260) 
at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:673) 
at org.apache.sqoop.manager.OracleManager.importTable(OracleManager.java:444) 
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497) 
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:606) 
at org.apache.sqoop.Sqoop.run(Sqoop.java:143) 
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) 
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179) 
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218) 
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227) 
at org.apache.sqoop.Sqoop.main(Sqoop.java:236) 
Caused by: org.codehaus.jackson.JsonParseException: Unexpected end-of-input: was expecting closing quote for a string value 


Root Cause:


The issue is because of limitation on the number of characters that can be stored in TBLPROPERTIES in hive metastore.

Inside hive metastore db:

mysql> describe TABLE_PARAMS;
+-------------+----------------+------+-----+---------+-------+
| Field       | Type           | Null | Key | Default | Extra |
+-------------+----------------+------+-----+---------+-------+
| TBL_ID      | bigint(20)     | NO   | PRI | NULL    |       |
| PARAM_KEY   | varchar(256)   | NO   | PRI | NULL    |       |
| PARAM_VALUE | varchar(4000)  | YES  |     | NULL    |       |
+-------------+----------------+------+-----+---------+-------+
3 rows in set (0.00 sec)


The default type for PARAM_VALUE is varchar(4000).
In this case, when hive parquet table is created, table property 'avro.schema.literal' is truncated to 4000 characters which results invalid JSON.
This will cause JSON parse exception resulting SQOOP import failure.


Resolution:


The solution is to either limit the character length to 4000 or increase character length for 'PARAM_VALUE'.

Following are the solutions:

[1] Increase character length for 'PARAM_VALUE'.
For example:
mysql> describe TABLE_PARAMS;
+-------------+----------------+------+-----+---------+-------+
| Field       | Type           | Null | Key | Default | Extra |
+-------------+----------------+------+-----+---------+-------+
| TBL_ID      | bigint(20)     | NO   | PRI | NULL    |       |
| PARAM_KEY   | varchar(256)   | NO   | PRI | NULL    |       |
| PARAM_VALUE | varchar(10000) | YES  |     | NULL    |       |
+-------------+----------------+------+-----+---------+-------+
3 rows in set (0.00 sec)

OR [2] Instead of using "select * from alwin_test where \$CONDITIONS" in SQOOP import, we can use alias name to reduce column name length and hence reduce character length for 'avro.schema.literal'

OR [3] Create a view of Oracle (source) tables with much smaller column name and then do SQOOP import on the view created.