Writing a Foreign Data Wrapper

A newer version of this documentation is available. Use the version menu above to view the most up-to-date release of the Greenplum 6.x documentation.

Writing a Foreign Data Wrapper

This chapter outlines how to write a new foreign-data wrapper.

All operations on a foreign table are handled through its foreign-data wrapper (FDW), a library that consists of a set of functions that the core Greenplum Database server calls. The foreign-data wrapper is responsible for fetching data from the remote data store and returning it to the Greenplum Database executor. If updating foreign-data is supported, the wrapper must handle that, too.

The foreign-data wrappers included in the Greenplum Database open source github repository are good references when trying to write your own. You may want to examine the file_fdw and postgres_fdw modules in the contrib/ directory. The CREATE FOREIGN DATA WRAPPER reference page also provides some useful details.

Note: The SQL standard specifies an interface for writing foreign-data wrappers. Greenplum Database does not implement that API, however, because the effort to accommodate it into Greenplum would be large, and the standard API hasn't yet gained wide adoption.

This topic includes the following sections:

Requirements

When you develop with the Greenplum Database foreign-data wrapper API:

  • You must develop your code on a system with the same hardware and software architecture as that of your Greenplum Database hosts.
  • Your code must be written in a compiled language such as C, using the version-1 interface. For details on C language calling conventions and dynamic loading, refer to C Language Functions in the PostgreSQL documentation.
  • Symbol names in your object files must not conflict with each other nor with symbols defined in the Greenplum Database server. You must rename your functions or variables if you get error messages to this effect.
  • Review the foreign table introduction described in Accessing External Data with Foreign Tables.

Known Issues and Limitations

The Greenplum Database 6 foreign-data wrapper implementation has the following known issues and limitations:

  • The Greenplum Database 6 distribution does not install any foreign data wrappers.
  • Greenplum Database uses the mpp_execute option value for foreign table scans only. Greenplum does not honor the mpp_execute setting when you write to, or update, a foreign table; all write operations are initiated through the master.

Header Files

The Greenplum Database header files that you may use when you develop a foreign-data wrapper are located in the greenplum-db/src/include/ directory (when developing against the Greenplum Database open source github repository), or installed in the $GPHOME/include/postgresql/server/ directory (when developing against a Greenplum installation):

  • foreign/fdwapi.h - FDW API structures and callback function signatures
  • foreign/foreign.h - foreign-data wrapper helper structs and functions
  • catalog/pg_foreign_table.h - foreign table definition
  • catalog/pg_foreign_server.h - foreign server definition

Your FDW code may also be dependent on header files and libraries required to access the remote data store.

Foreign Data Wrapper Functions

The developer of a foreign-data wrapper must implement an SQL-invokable handler function, and optionally an SQL-invokable validator function. Both functions must be written in a compiled language such as C, using the version-1 interface.

The handler function simply returns a struct of function pointers to callback functions that will be called by the Greenplum Database planner, executor, and various maintenance commands. The handler function must be registered with Greenplum Database as taking no arguments and returning the special pseudo-type fdw_handler. For example:

CREATE FUNCTION NEW_fdw_handler()
  RETURNS fdw_handler
  AS 'MODULE_PATHNAME'
LANGUAGE C STRICT;

Most of the effort in writing a foreign-data wrapper is in implementing the callback functions. The FDW API callback functions, plain C functions that are not visible or callable at the SQL level, are described in Foreign Data Wrapper Callback Functions.

The validator function is responsible for validating options provided in CREATE and ALTER commands for its foreign-data wrapper, as well as foreign servers, user mappings, and foreign tables using the wrapper. The validator function must be registered as taking two arguments, a text array containing the options to be validated, and an OID representing the type of object with which the options are associated. For example:

CREATE FUNCTION NEW_fdw_validator( text[], oid )
  RETURNS void
  AS 'MODULE_PATHNAME'
LANGUAGE C STRICT;

The OID argument reflects the type of the system catalog that the object would be stored in, one of ForeignDataWrapperRelationId, ForeignServerRelationId, UserMappingRelationId, or ForeignTableRelationId. If no validator function is supplied by a foreign data wrapper, Greenplum Database does not check option validity at object creation time or object alteration time.

Foreign Data Wrapper Callback Functions

The foreign-data wrapper API defines callback functions that Greenplum Database invokes when scanning and updating foreign tables. The API also includes callbacks for performing explain and analyze operations on a foreign table.

The handler function of a foreign-data wrapper returns a palloc'd FdwRoutine struct containing pointers to callback functions described below. The FdwRoutine struct is located in the foreign/fdwapi.h header file, and is defined as follows:

/*
 * FdwRoutine is the struct returned by a foreign-data wrapper's handler
 * function.  It provides pointers to the callback functions needed by the
 * planner and executor.
 *
 * More function pointers are likely to be added in the future.  Therefore
 * it's recommended that the handler initialize the struct with
 * makeNode(FdwRoutine) so that all fields are set to NULL.  This will
 * ensure that no fields are accidentally left undefined.
 */
typedef struct FdwRoutine
{
	NodeTag		type;

	/* Functions for scanning foreign tables */
	GetForeignRelSize_function GetForeignRelSize;
	GetForeignPaths_function GetForeignPaths;
	GetForeignPlan_function GetForeignPlan;
	BeginForeignScan_function BeginForeignScan;
	IterateForeignScan_function IterateForeignScan;
	ReScanForeignScan_function ReScanForeignScan;
	EndForeignScan_function EndForeignScan;

	/*
	 * Remaining functions are optional.  Set the pointer to NULL for any that
	 * are not provided.
	 */

	/* Functions for updating foreign tables */
	AddForeignUpdateTargets_function AddForeignUpdateTargets;
	PlanForeignModify_function PlanForeignModify;
	BeginForeignModify_function BeginForeignModify;
	ExecForeignInsert_function ExecForeignInsert;
	ExecForeignUpdate_function ExecForeignUpdate;
	ExecForeignDelete_function ExecForeignDelete;
	EndForeignModify_function EndForeignModify;
	IsForeignRelUpdatable_function IsForeignRelUpdatable;

	/* Support functions for EXPLAIN */
	ExplainForeignScan_function ExplainForeignScan;
	ExplainForeignModify_function ExplainForeignModify;

	/* Support functions for ANALYZE */
	AnalyzeForeignTable_function AnalyzeForeignTable;
} FdwRoutine;

You must implement the scan-related functions in your foreign-data wrapper; implementing the other callback functions is optional.

Scan-related callback functions include:

Callback Signature Description
void
GetForeignRelSize (PlannerInfo *root,
                   RelOptInfo *baserel,
                   Oid foreigntableid)
Obtain relation size estimates for a foreign table. Called at the beginning of planning for a query on a foreign table.
void
GetForeignPaths (PlannerInfo *root,
                 RelOptInfo *baserel,
                 Oid foreigntableid)
Create possible access paths for a scan on a foreign table. Called during query planning.
Note: A Greenplum Database-compatible FDW must call create_foreignscan_path() in its GetForeignPaths() callback function.
ForeignScan *
GetForeignPlan (PlannerInfo *root,
                RelOptInfo *baserel,
                Oid foreigntableid,
                ForeignPath *best_path,
                List *tlist,
                List *scan_clauses)
Create a ForeignScan plan node from the selected foreign access path. Called at the end of query planning.
void
BeginForeignScan (ForeignScanState *node,
                  int eflags)
Begin executing a foreign scan. Called during executor startup.
TupleTableSlot *
IterateForeignScan (ForeignScanState *node)
Fetch one row from the foreign source, returning it in a tuple table slot; return NULL if no more rows are available.
void
ReScanForeignScan (ForeignScanState *node)
Restart the scan from the beginning.
void
EndForeignScan (ForeignScanState *node)
End the scan and release resources.

Refer to Foreign Data Wrapper Callback Routines in the PostgreSQL documentation for detailed information about the inputs and outputs of the FDW callback functions.

Foreign Data Wrapper Helper Functions

The FDW API exports several helper functions from the Greenplum Database core server so that authors of foreign-data wrappers have easy access to attributes of FDW-related objects, such as options provided when the user creates or alters the foreign-data wrapper, server, or foreign table. To use these helper functions, you must include foreign.h header file in your source file:

#include "foreign/foreign.h"

The FDW API includes the helper functions listed in the table below. Refer to Foreign Data Wrapper Helper Functions in the PostgreSQL documentation for more information about these functions.

Helper Signature Description
ForeignDataWrapper *
GetForeignDataWrapper(Oid fdwid);
Returns the ForeignDataWrapper object for the foreign-data wrapper with the given OID.
ForeignDataWrapper *
GetForeignDataWrapperByName(const char *name, bool missing_ok);
Returns the ForeignDataWrapper object for the foreign-data wrapper with the given name.
ForeignServer *
GetForeignServer(Oid serverid);
Returns the ForeignServer object for the foreign server with the given OID.
ForeignServer *
GetForeignServerByName(const char *name, bool missing_ok);
Returns the ForeignServer object for the foreign server with the given name.
UserMapping *
GetUserMapping(Oid userid, Oid serverid);
Returns the UserMapping object for the user mapping of the given role on the given server.
ForeignTable *
GetForeignTable(Oid relid);
Returns the ForeignTable object for the foreign table with the given OID.
List *
GetForeignColumnOptions(Oid relid, AttrNumber attnum);
Returns the per-column FDW options for the column with the given foreign table OID and attribute number.

Greenplum Database Considerations

A Greenplum Database user can specify the mpp_execute option when they create or alter a foreign table, foreign server, or foreign data wrapper. A Greenplum Database-compatible foreign-data wrapper examines the mpp_execute option value on a scan and uses it to determine where to request data - from the master (the default), any (master or any one segment), or all segments.

Note: Write/update operations using a foreign data wrapper are always executed on the Greenplum Database master, regardless of the mpp_execute setting.

The following scan code snippet probes the mpp_execute value associated with a foreign table:

ForeignTable *table = GetForeignTable(foreigntableid);
if (table->exec_location == FTEXECLOCATION_ALL_SEGMENTS)
{
    ...
}
else if (table->exec_location == FTEXECLOCATION_ANY)
{
    ...
}
else if (table->exec_location == FTEXECLOCATION_MASTER)
{
    ...
} 

If the foreign table was not created with an mpp_execute option setting, the mpp_execute setting of the foreign server, and then the foreign data wrapper, is probed and used. If none of the foreign-data-related objects has an mpp_execute setting, the default setting is master.

If a foreign-data wrapper supports mpp_execute 'all', it will implement a policy that matches Greenplum segments to data. So as not to duplicate data retrieved from the remote, the FDW on each segment must be able to establish which portion of the data is their responsibility. An FDW may use the segment identifier and the number of segments to help make this determination. The following code snippet demonstrates how a foreign-data wrapper may retrieve the segment number and total number of segments:

int segmentNumber = GpIdentity.segindex;
int totalNumberOfSegments = getgpsegmentCount();

Building a Foreign Data Wrapper Extension with PGXS

You compile the foreign-data wrapper functions that you write with the FDW API into one or more shared libraries that the Greenplum Database server loads on demand.

You can use the PostgreSQL build extension infrastructure (PGXS) to build the source code for your foreign-data wrapper against a Greenplum Database installation. This framework automates common build rules for simple modules. If you have a more complicated use case, you will need to write your own build system.

To use the PGXS infrastructure to generate a shared library for your FDW, create a simple Makefile that sets PGXS-specific variables.

Note: Refer to Extension Building Infrastructure in the PostgreSQL documentation for information about the Makefile variables supported by PGXS.

For example, the following Makefile generates a shared library in the current working directory named base_fdw.so from two C source files, base_fdw_1.c and base_fdw_2.c:

MODULE_big = base_fdw
OBJS = base_fdw_1.o base_fdw_2.o

PG_CONFIG = pg_config
PGXS := $(shell $(PG_CONFIG) --pgxs)

PG_CPPFLAGS = -I$(shell $(PG_CONFIG) --includedir)
SHLIB_LINK = -L$(shell $(PG_CONFIG) --libdir)
include $(PGXS)

A description of the directives used in this Makefile follows:

  • MODULE_big - identifes the base name of the shared library generated by the Makefile
  • PG_CPPFLAGS - adds the Greenplum Database installation include/ directory to the compiler header file search path
  • SHLIB_LINK adds the Greenplum Database installation library directory ($GPHOME/lib/) to the linker search path
  • The PG_CONFIG and PGXS variable settings and the include statement are required and typically reside in the last three lines of the Makefile.

To package the foreign-data wrapper as a Greenplum Database extension, you create script (newfdw--version.sql) and control (newfdw.control) files that register the FDW handler and validator functions, create the foreign data wrapper, and identify the characteristics of the FDW shared library file.

Note: Packaging Related Objects into an Extension in the PostgreSQL documentation describes how to package an extension.

Example foreign-data wrapper extension script file named base_fdw--1.0.sql:

CREATE FUNCTION base_fdw_handler()
  RETURNS fdw_handler
  AS 'MODULE_PATHNAME'
LANGUAGE C STRICT;

CREATE FUNCTION base_fdw_validator(text[], oid)
  RETURNS void
  AS 'MODULE_PATHNAME'
LANGUAGE C STRICT;

CREATE FOREIGN DATA WRAPPER base_fdw
  HANDLER base_fdw_handler
  VALIDATOR base_fdw_validator;

Example FDW control file named base_fdw.control:

# base_fdw FDW extension
comment = 'base foreign-data wrapper implementation; does not do much'
default_version = '1.0'
module_pathname = '$libdir/base_fdw'
relocatable = true

When you add the following directives to the Makefile, you identify the FDW extension control file base name (EXTENSION) and SQL script (DATA):

EXTENSION = base_fdw
DATA = base_fdw--1.0.sql

Running make install with these directives in the Makefile copies the shared library and FDW SQL and control files into the specified or default locations in your Greenplum Database installation ($GPHOME).

Deployment Considerations

You must package the FDW shared library and extension files in a form suitable for deployment in a Greenplum Database cluster. When you construct and deploy the package, take into consideration the following:

  • The FDW shared library must be installed to the same file system location on the master host and on every segment host in the Greenplum Database cluster. You specify this location in the .control file. This location is typically the $GPHOME/lib/postgresql/ directory.
  • The FDW .sql and .control files must be installed to the $GPHOME/share/postgresql/extension/ directory on the master host and on every segment host in the Greenplum Database cluster.
  • The gpadmin user must have permission to traverse the complete file system path to the FDW shared library file and extension files.