Initializing and Managing PXF

A newer version of this documentation is available. Click here to view the most up-to-date release of the Greenplum 5.x documentation.

You must initialize and start PXF before you can use the framework.

PXF provides two management commands:

  • pxf cluster - manage all PXF service instances in the Greenplum Database cluster
  • pxf - manage the PXF service instance on a specific Greenplum Database host

The pxf cluster command supports init, start, and stop subcommands. When you run a pxf cluster subcommand, you perform the operation on all segment hosts in the Greenplum Database cluster.

The pxf command supports init, start, stop, restart, and status operations. These operations run locally. That is, if you want to start or stop the PXF agent on a specific Greenplum Database segment host, you log in to the host and run the command.

Initializing PXF

You must explicitly initialize the PXF service instance. This one-time initialization creates the PXF service web application and generates PXF configuration files and templates.

PXF supports both internal and user-customizable configuration properties. Initializing PXF generates PXF internal configuration files, setting default properties specific to your configuration. Initializing PXF also generates configuration file templates for user-customizable settings such as custom profiles and PXF runtime and logging settings.

PXF internal configuration files are located in $GPHOME/pxf/conf. You identify the PXF user configuration directory at initialization time via an environment variable named $PXF_CONF. If you do not set $PXF_CONF prior to initializing PXF, PXF may prompt you to accept or decline the default user configuration directory, $HOME/pxf, during the initialization process.

Note: The gpadmin user must have permission to either create, or write to, the specified $PXF_CONF directory.

During initialization, PXF creates the $PXF_CONF directory if necessary, and then populates it with subdirectories and template files. Refer to PXF User Configuration Directories for a list of these directories and their contents.

Prerequisites

Before initializing PXF in your Greenplum Database cluster, ensure that:

  • Your Greenplum Database cluster is up and running.
  • You have identified the PXF user configuration directory filesystem location, $PXF_CONF.

Procedure

Perform the following procedure to initialize PXF on each segment host in your Greenplum Database cluster.

  1. Log in to the Greenplum Database master node:

    $ ssh gpadmin@<gpmaster>
    
  2. Run the pxf cluster init command to initialize the PXF service on the master and on each segment host. For example, the following command specifies /etc/pxf/usercfg as the PXF user configuration directory for initialization.)

    gpadmin@gpmaster$ PXF_CONF=/etc/pxf/usercfg $GPHOME/pxf/bin/pxf cluster init
    

    The init command creates the PXF web application and initializes the internal PXF configuration. The init command also creates the $PXF_CONF user configuration directory if it does not exist, and populates the directory with user-customizable configuration templates.

    Note: The PXF service runs only on the segment hosts. However,pxf cluster init also sets up the PXF user configuration directories on the Greenplum Database master host.

Starting PXF

After initializing PXF, you must start PXF on each segment host in your Greenplum Database cluster. The PXF service, once started, runs as the gpadmin user on default port 5888. Only the gpadmin user can start and stop the PXF service.

If you want to change the default PXF configuration, you must update the configuration before you start PXF.

$PXF_CONF/conf includes these user-customizable configuration files:

  • pxf-env.sh - runtime configuration parameters
  • pxf-log4j.properties - logging configuration parameters
  • pxf-profiles.xml - custom profile definitions

The pxf-env.sh file exposes the following PXF runtime configuration parameters:

Parameter Description Default Value
JAVA_HOME The Java JRE home directory. /usr/java/default
PXF_LOG_DIR The PXF log directory. $PXF_CONF/logs
PXF_JVM_OPTS Default options for the PXF Java virtual machine. -Xmx2g -Xms1g
PXF_KEYTAB The absolute path to the PXF service Kerberos principal keytab file. $PXF_CONF/keytabs/pxf.service.keytab
PXF_PRINCIPAL The PXF service Kerberos principal. gpadmin/_HOST@EXAMPLE.COM

You must propagate any changes that you make to pxf-env.sh, pxf-log4j.properties, or pxf-profiles.xml to each Greenplum Database segment host, and (re)start PXF on each host.

Prerequisites

Before you start PXF in your Greenplum Database cluster, ensure that:

  • Your Greenplum Database cluster is up and running.
  • You have previously initialized PXF.

Procedure

Perform the following procedure to start PXF on each segment host in your Greenplum Database cluster.

  1. Log in to the Greenplum Database master node:

    $ ssh gpadmin@<gpmaster>
    
  2. Run the pxf cluster start command to start PXF on each segment host. For example:

    gpadmin@gpmaster$ $GPHOME/pxf/bin/pxf cluster start
    

Stopping PXF

If you must stop PXF, for example if you are upgrading PXF, you must stop PXF on each segment host in your Greenplum Database cluster. Only the gpadmin user can stop the PXF service.

Prerequisites

Before you stop PXF in your Greenplum Database cluster, ensure that your Greenplum Database cluster is up and running.

Procedure

Perform the following procedure to stop PXF on each segment host in your Greenplum Database cluster.

  1. Log in to the Greenplum Database master node:

    $ ssh gpadmin@<gpmaster>
    
  2. Run the pxf cluster stop command to stop PXF on each segment host. For example:

    gpadmin@gpmaster$ $GPHOME/pxf/bin/pxf cluster stop
    

Restarting PXF

If you must restart PXF, for example if you updated PXF user configuration files in $PXF_CONF/conf, you can stop, and then start, PXF in your Greenplum Database cluster.

Only the gpadmin user can restart the PXF service.

Prerequisites

Before you restart PXF in your Greenplum Database cluster, ensure that your Greenplum Database cluster is up and running.

Procedure

Perform the following procedure to restart PXF in your Greenplum Database cluster.

  1. Log in to the Greenplum Database master node:

    $ ssh gpadmin@<gpmaster>
    
  2. Restart PXF:

    gpadmin@gpmaster$ $GPHOME/pxf/bin/pxf cluster stop
    gpadmin@gpmaster$ $GPHOME/pxf/bin/pxf cluster start
    

Displaying PXF Status

If you want to display PXF status, you must explicitly request the status of the PXF service instance on each segment host in your Greenplum Database cluster.

Only the gpadmin user can request the status of the PXF service.

Perform the following procedure to request PXF status on each segment host in your Greenplum Database cluster.

  1. Log in to the Greenplum Database master node:

    $ ssh gpadmin@<gpmaster>
    
  2. Use the gpssh command and a seghostfile to run the pxf status command on each segment host:

    gpadmin@gpmaster$ gpssh -e -v -f seghostfile "/usr/local/greenplum-db/pxf/bin/pxf status"