Configuring the Greenplum Stream Server for Encryption and Authentication

Configuring the Greenplum Stream Server for Encryption and Authentication

GPSS supports authenticating with Kerberos to obtain both Kafka and Greenplum Database credentials. GPSS also supports using SSL to encrypt communication between Kafka and GPSS, and on the data channel between GPSS and Greenplum.

Note: If you want to use SSL encryption, you must explicitly start a Greenplum Stream Server instance with the gpss command, providing a gpss.json configuration file that specifies the certificate files. You must also use the gpsscli subcommands, not gpkafka load, to submit and manage the load job.

Configuring gpss for SSL Encryption to Kafka

If your Kafka version 0.9 and newer cluster is configured to use SSL encryption, you must configure GPSS to use this encryption method when communicating with Kafka. You perform this configuration at both the GPSS service instance and client levels.

  1. Create client keys for the gpss service instance.
  2. Configure the gpss service instance to use SSL encryption to Kafka by providing a Certificate block in the GPSS configuration file that identifies the file system location of the SSL certificates. Sample gpss.json excerpt:
    "Certificate": {
        "CertFile": "/home/gpadmin/cert/multiCA/server.crt",
        "KeyFile": "/home/gpadmin/cert/multiCA/server.key",
        "CAFile": "/home/gpadmin/cert/rootCA.pem"
    }
  3. The gpkafka.yaml load configuration file KAFKA:INPUT:SOURCE:ENCRYPTION property governs GPSS's use of encrypted communication to Kafka. You must set this property to true before you submit the job to identify that you want to use SSL encryption.

Configuring gpss for SSL Encryption to Greenplum

There are two communication channels between GPSS and Greenplum Database: a control channel and a data channel. GPSS supports SSL encryption only on the data channel to Greenplum.

If your Greenplum Database cluster is configured to use SSL, you must configure GPSS to use this encryption method for the data channel when it communicates with Greenplum.

  1. Create client keys for the gpss service instance.
  2. Configure the gpss service instance to use SSL encryption to Greenplum by providing a Gpfdist:Encryption block in the GPSS configuration file that identifies the file system location of the SSL certificates. Sample gpss.json excerpt:
    "Gpfdist": {
        "Host": "127.0.0.1",
        "Port": 5001,
        "Encryption": {
            "CertFile": "/home/gpadmin/cert/gpfdists/server.crt",
            "KeyFile": "/home/gpadmin/cert/gpfdists/server.key",
            "CAFile": "/home/gpadmin/cert/gpfdists/root.crt"
        }
    }

Configuring gpss for Kerberos Authentication to Greenplum

If Kerberos authentication is enabled for Greenplum Database, you must configure gpss to authenticate with Kerberos.

GPSS uses a kerberos ticket, and the USER name specified in the load configuration file, to connect to Greenplum Database.

  1. Create a Kerberos principal for each Greenplum Database user that will use GPSS to load data into Greenplum.
  2. Specify the principal name in the load configuration file USER property value.
  3. Generate a Kerberos ticket for this principal before you submit a load job with the gpsscli submit, gpsscli load, or gpkafka load commands.
Note: If your Greenplum Database Kerberos service name is not the default (postgres), set the PGKRBSRVNAME environment variable to the correct service name before you start the gpss service instance or run gpkafka load.

Configuring gpss for Kerberos Authentication to Kafka

If your Kafka version 0.9 and newer cluster is configured for Kerberos authentication, you must configure GPSS to use this authentication method. You perform this configuration at both the gpss service instance level and the GPSS client level.

GPSS is a Kafka client. You must create a Kerberos principal for the gpss server instance accessing Kafka, and generate a keytab file for this principal. By default, GPSS runs kinit using this principal and keytab to generate the Kerberos ticket.

You must set certain Kafka properties in your load configuration file to use Kerberos user authentication to Kafka. The following table identifies keywords and values that you can add to the PROPERTIES block in your gpkafka.yaml load configuration file:

Keyword Value
security.protocol The Kafka security protocol. Obtain the value from the Kafka server server.properties configuration file. GPSS supports the SASL_SSL (Kerberos and SSL) and SASL_PLAINTEXT (Kerberos, no SSL) protocols.
sasl.kerberos.keytab The absolute path to the GPSS or user Kerberos keytab file for Kafka on the local system.
sasl.kerberos.kinit.cmd The Kerberos kinit command string. If this property is not specified, GPSS uses the default value as described in librdkafka Global configuration properties when it runs the kinit command. If you do not want GPSS to run kinit, set the sasl.kerberos.kinit.cmd property to an empty value ("") or no value.
sasl.kerberos.principal The GPSS or user Kerberos service principal name; typically of the format <name>@<realm> or <primary>/<instance>@<realm>.
sasl.kerberos.service.name The Kafka Kerberos principal name. Obtain the value from the Kafka server server.properties configuration file. The default Kafka Kerberos service name is kafka.

For example:

PROPERTIES:
    security.protocol: SASL_PLAINTEXT
    sasl.kerberos.service.name: kafka
    sasl.kerberos.keytab: /var/kerberos/krb5kdc/gpss.keytab
    sasl.kerberos.principal: gpss/localhost@REALM.COM
    sasl.kerberos.kinit.cmd: