Reading JSON Data from an Object Store
A newer version of this documentation is available. Use the version menu above to view the most up-to-date release of the Greenplum 6.x documentation.
The PXF object store connectors support reading JSON-format data. This section describes how to use PXF to access JSON data in an object store, including how to create and query an external table that references a JSON file in the store.
Note: Accessing JSON-format data from an object store is very similar to accessing JSON-format data in HDFS. This topic identifies object store-specific information required to read JSON data, and links to the PXF HDFS JSON documentation where appropriate for common information.
Prerequisites
Ensure that you have met the PXF Object Store Prerequisites before you attempt to read data from an object store.
Working with Avro Data
Refer to Working with JSON Data in the PXF HDFS JSON documentation for a description of the JSON text-based data-interchange format.
Creating the External Table
Use the <objstore>:json
profile to read JSON-format files from an object store. PXF supports the following <objstore>
profile prefixes:
Object Store | Profile Prefix |
---|---|
Azure Blob Storage | wasbs |
Azure Data Lake | adl |
Google Cloud Storage | gs |
Minio | s3 |
S3 | s3 |
The following syntax creates a Greenplum Database readable external table that references a JSON-format file:
CREATE EXTERNAL TABLE <table_name>
( <column_name> <data_type> [, ...] | LIKE <other_table> )
LOCATION ('pxf://<path-to-file>?PROFILE=<objstore>:json&SERVER=<server_name>[&<custom-option>=<value>[...]]')
FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');
The specific keywords and values used in the CREATE EXTERNAL TABLE command are described in the table below.
Keyword | Value |
---|---|
<path‑to‑file> | The absolute path to the directory or file in the object store. |
PROFILE=<objstore>:json | The PROFILE keyword must identify the specific object store. For example, s3:json . |
SERVER=<server_name> | The named server configuration that PXF uses to access the data. |
<custom‑option>=<value> | JSON supports the custom option named IDENTIFIER as described in the PXF HDFS JSON documentation. |
FORMAT ‘CUSTOM’ | Use FORMAT 'CUSTOM' with the <objstore>:json profile. The CUSTOM FORMAT requires that you specify (FORMATTER='pxfwritable_import') . |
If you are accessing an S3 object store, you can provide S3 credentials via custom options in the CREATE EXTERNAL TABLE
command as described in Overriding the S3 Server Configuration with DDL.
Example
Refer to Loading the Sample JSON Data to HDFS and Example: Reading a JSON File with Single Line Records in the PXF HDFS JSON documentation for a JSON example. Modifications that you must make to run the example with an object store include:
Copying the file to the object store instead of HDFS. For example, to copy the file to S3:
$ aws s3 cp /tmp/singleline.json s3://BUCKET/pxf_examples/ $ aws s3 cp /tmp/multiline.json s3://BUCKET/pxf_examples/
Using the
CREATE EXTERNAL TABLE
syntax andLOCATION
keywords and settings described above. For example, if your server name iss3srvcfg
:CREATE EXTERNAL TABLE singleline_json_s3( created_at TEXT, id_str TEXT, "user.id" INTEGER, "user.location" TEXT, "coordinates.values[0]" INTEGER, "coordinates.values[1]" INTEGER ) LOCATION('pxf://BUCKET/pxf_examples/singleline.json?PROFILE=s3:json&SERVER=s3srvcfg') FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');