Specify HDFS Data in an External Table Definition

Specify HDFS Data in an External Table Definition

For hadoop files, except for files in a MapR cluster, the LOCATION clause of the CREATE EXTERNAL TABLE command has the following format:

LOCATION ('gphdfs://hdfs_host[:port]/path/filename.txt')
If you are using MapR clusters, you specify a specific cluster and the file:
  • To specify the default cluster, the first entry in the MapR configuration file /opt/mapr/conf/mapr-clusters.conf, specify the location of your table with this syntax:
     LOCATION ('gphdfs:///file_path')
    The file_path is the path to the file.
  • To specify another MapR cluster listed in the configuration file, specify the file with this syntax:
     LOCATION ('gphdfs:///mapr/cluster_name/file_path')
    The cluster_name is the name of the cluster specified in the configuration file and file_path is the path to the file.

For information about MapR clusters, see the MapR documentation.

Restrictions for HDFS files are as follows.
  • You can specify one path for a readable external table with gphdfs. Wildcard characters are allowed. If you specify a directory, the default is all files in the directory.

    You can specify only a directory for writable external tables.

  • Format restrictions are as follows.
    • Only the gphdfs_import formatter is allowed for readable external tables with a custom format.
    • Only the gphdfs_export formatter is allowed for writable external tables with a custom format.
    • You can set compression only for writable external tables. Compression settings are automatic for readable external tables.