PostgreSQL Foreign Data Wrappers

9 May 2019

Günce Kaya

guncekaya.blogspot.com

@gguncesi

Agenda

  • What is SQL/MED?
  • What is PostgreSQL Foreign Data Wrapper?
  • History of FDW
  • Why FDW?
  • How to Use FDW?
  • FDW Restrictions
  • Importance of Push Down
  • Types of FDWs
  • Generate Your FDW with Multicorn

What is SQL/MED?

  • SQL/MED was defined in 2003
  • MED (Management of External Data)
  • SQL/MED is SO/IEC 9075-9
  • Access data which is an external source(SQL or non-SQL data) with standard SQL.

What is SQL/MED?

  • Possible to connect to other RDBMS or Non-RDBMS
  • Read and manage data in the file system
  • Read non-SQL data (HTML, CSV, JSON etc.)

What is SQL/MED?

  • Foreign Data Wrapper
  • Foreign Server
  • Foreign Table
  • Foreign Data

What is SQL/MED?

There are two components:

  • Wrapper Interface
  • Datalink
    • https://wiki.postgresql.org/wiki/DATALINK
    • https://github.com/lacanoid/datalink
create table datalink_example (
  id int,
  datalink_col datalink
);

insert into datalink_example (datalink_col) select dlvalue('file:///tmp/test.txt');

What is SQL/MED?

There are two components:

  • Wrapper Interface
  • Datalink
    • https://wiki.postgresql.org/wiki/DATALINK
    • https://github.com/lacanoid/datalink
create table datalink_example (
  id int,
  datalink_col datalink
);

insert into datalink_example (datalink_col) select dlvalue('file:///tmp/test.txt');

Components of the Wrapper Interface

ref: http://cs.unibo.it/~montesi/CBD/Articoli/SQL-MED.pdf

so, how SQL/MED works?

How SQL/MED works?

Local SQL-server has some few responsible;

  1. Decompose query into fragments.
  2. Connect Foreign Servers.
  3. Prepare the execution plan for each fragment.
  4. Start the execution for all plans.
  5. Receive result data
  6. Complete query execution
  7. Return the result to the application.

What is PostgreSQL Foreign Data Wrapper?

What is PostgreSQL Foreign Data Wrapper?

  • FDW was introduced in PostgreSQL 9.1
  • Access to foreign data from an external source with SQL
  • Source can be;
    • A relational or non-relational database
    • CSV, XML, JSON etc.
    • A file system's files
    • An RSS's URL
    • ...
  • Use Foreign Server
  • Retrieve external data via  Foreign Table
    • A foreign table is generated as a remote table's meta-table structure.
    • A foreign table doesn't contain any data. Data is stored in the external table.
  • Foreign Data is data stored in an external source.

What is PostgreSQL Foreign Data Wrapper?

History of FDW

8.4

9.1

9.2

  • Implementation started
  • file_fdw is added (in-core)
  • Foreign table is introduced
  • Informational commands for psql \des+ \det+ \dew+

9.3

  • postgres_fdw is introduced (in-core)
  • Push down for WHERE

9.4

  • TRIGGER on Foreign Tables

9.5

  • IMPORT SCHEMA was introduced for postgres_fdw with LIMIT clause

9.6

  • Push down for JOIN, UPDATE, DELETE and ORDER BY statements

10

  • push down aggregate functions
  • file_fdw can read data from program output

11

  • INSERT, UPDATE and COPY on foreign partitions (postgres_fdw)
  • Push down aggregates to foreign tables that are partitions.

Why FDW?

  • Postgres can be a bridge between different sources.
  • Instant migration from other external sources to Postgres.
  • Access external source without duplicated or replaced data

  • FDW makes Postgres more flexible.
    • It’s possible to use JOIN statement to a remote table with a table in internal server.
    • Financial or DWH (Data Warehouse) queries require to join tables which are in different databases/clusters/servers.

Why FDW?

  • Clearer syntax for accessing remote tables

  • INSERT, UPDATE, DELETE statements on foreign tables (not all of fdw!)

  • Privileges can be set on foreign tables as INSERT, UPDATE, DELETE.

Why FDW?

  • A foreign table has meta-table definition.

    • It helps to avoid running "SELECT * FROM table;"

  • A connection string only needs to be defined one time. FDW uses SERVER to do it.

  • Some FDWs support connection pooling.

Why FDW?

  • It's possible to use with EXPLAIN command like: “Remote SQL” (not all of fdw provides)

artemis=# EXPLAIN (ANALYZE, VERBOSE) SELECT * FROM students LIMIT 1;
                                                         QUERY PLAN             
                                             
--------------------------------------------------------------------------------
---------------------------------------------
 Limit  (cost=100.00..100.04 rows=1 width=36) (actual time=19.554..19.567 rows=1
 loops=1)
   Output: id, name
   ->  Foreign Scan on public.students  (cost=100.00..150.95 rows=1365 width=36)
 (actual time=19.544..19.549 rows=1 loops=1)
         Output: id, name
         Remote SQL: SELECT id, name FROM public.students
 Planning time: 4.123 ms
 Execution time: 275.614 ms
(7 rows)

Why FDW?

  • Foreign tables can be a partition table of a partitioned table which is in the local server database. It's possible to use different foreign tables that are in different clusters.
-- Create partition table
artemis=# CREATE TABLE all_class (i INTEGER, random DOUBLE PRECISION) 
PARTITION BY RANGE (i);
CREATE TABLE

-- Attach foreign tables to the partition table
artemis=# ALTER TABLE all_class ATTACH PARTITION pg_class 
FOR VALUES from (1) TO (200);
ALTER TABLE

artemis=# ALTER TABLE all_class ATTACH PARTITION pg_class2 
FOR VALUES from (200) TO (300);
ALTER TABLE

artemis=# SELECT * FROM all_class ;
  i  |       random       
-----+--------------------
 200 |  0.794649593532085
   1 |  0.979228006210178
  • Sharding capabilities with FDW
    • Generate a partition table which has its partition tables stored in a different server

Why FDW?

PostgreSQL

PostgreSQL

PostgreSQL

PostgreSQL

PostgreSQL

PostgreSQL

  • Sharding capabilities with FDW
    • Generate a partition table which has its partition tables stored in a different server

Why FDW?

PostgreSQL (postgres_fdw)

PostgreSQL

File System (file_fdw)

MySQL (mysql_fdw)

Why FDW?


artemis=# SELECT ft.foreign_table_name, 
            fs.foreign_server_name, 
            fs.foreign_data_wrapper_name
FROM information_schema.foreign_tables ft INNER JOIN information_schema.foreign_servers fs 
ON fs.foreign_server_name = ft.foreign_server_name;
 foreign_table_name | foreign_server_name | foreign_data_wrapper_name 
--------------------+---------------------+---------------------------
 file_class         | csv                 | file_fdw
 mysql_class        | mysql_server        | mysql_fdw
 pg_class           | pg_luna_server      | postgres_fdw

(3 rows)
artemis=# \d+ all_class 
                                     Table "public.all_class"
 Column |       Type       | Collation | Nullable | Default | Storage | Stats target | Description 
--------+------------------+-----------+----------+---------+---------+--------------+-------------
 i      | integer          |           |          |         | plain   |              | 
 random | double precision |           |          |         | plain   |              | 
Partition key: RANGE (i)
Partitions: file_class FOR VALUES FROM (600) TO (700),
            mysql_class FOR VALUES FROM (500) TO (600),
            pg_class FOR VALUES FROM (1) TO (500)
artemis=# SELECT a.i, a.random, 
    (CASE 
        WHEN EXISTS (SELECT 1 FROM file_class WHERE i=a.i) THEN 'file_class' 
        WHEN EXISTS (SELECT 1 FROM pg_class WHERE i=a.i) THEN 'pg_class' 
        ELSE 'mysql_class' 
    end) AS foreign_table_name 
FROM all_class a ;
  i  |       random       |   foreign_table_name   
-----+--------------------+------------------------
  13 |  0.979228006210178 | pg_class
 500 |                2.8 | mysql_class
   1 |             4.6643 | file_class
(3 rows)
  • Push down is for some clauses but not all of FDW!
    • WHERE
    • JOIN
    • AGGREGATE functions
    • UPDATE/INSERT/DELETE
    • Sorting
    • Column
    • ...

Importance of Push Down

artemis=# EXPLAIN (ANALYZE, VERBOSE) SELECT AVG(id) FROM students WHERE id<10;
                                                       QUERY PLAN               
                                         
--------------------------------------------------------------------------------
-----------------------------------------
 Aggregate  (cost=168.50..168.51 rows=1 width=32) (actual time=2.635..2.650 rows
=1 loops=1)
   Output: avg(id)
   ->  Foreign Scan on public.students  (cost=100.00..166.06 rows=975 width=4) (
actual time=2.477..2.547 rows=9 loops=1)
         Output: id, name
         Remote SQL: SELECT id FROM public.students WHERE ((id < 10))
 Planning time: 0.240 ms
 Execution time: 5.149 ms
(7 rows)

Aggregate Functions (PG 9.6)

Importance of Push Down

Aggregate Functions (PG 11)

artemis=#  EXPLAIN (ANALYZE, VERBOSE) SELECT AVG(id) FROM students WHERE id<10;
                                          QUERY PLAN                            
               
--------------------------------------------------------------------------------
---------------
 Foreign Scan  (cost=102.44..149.03 rows=1 width=32) (actual time=3.001..3.003 r
ows=1 loops=1)
   Output: (avg(id))
   Relations: Aggregate on (public.students)
   Remote SQL: SELECT avg(id) FROM public.students WHERE ((id < 10))
 Planning time: 29.511 ms
 Execution time: 18.558 ms
(6 rows)

Importance of Push Down

How to FDW?

  • CREATE EXTENSION
  • CREATE SERVER
  • CREATE USER MAPPING
  • CREATE FOREIGN TABLE
CREATE FOREIGN TABLE <table name> 
    (col1 datatype, col2 datatype ...) 
    SERVER <server name>;


IMPORT FOREIGN SCHEMA <target schema> 
from server <server name> 
into <name of postgresql schema>;
CREATE EXTENSION postgres_fdw;
CREATE SERVER <server name> 
FOREIGN DATA WRAPPER <fdw name> 
OPTIONS(....);


CREATE SERVER test_server
FOREIGN DATA WRAPPER postgres_fdw
OPTIONS(host '192.168.111.1', port '5432');
CREATE USER MAPPING FOR <user> 
server <server name> 
options(....);


-- An example;
CREATE USER MAPPING FOR postgres
SERVER test_server
OPTIONS(user 'reporting', password 'secret');

1

2

4

3

FDW restrictions

  • Meta-table structure changes are not transferred to local databases.
  • The data types of the columns of the remote tables and foreign tables should have a similar structure.

Types of FDW

RDBMS FDW

  • jdbc_fdw

  • firebird_fdw

  • informix_fdw

  • ..

  • postgres_fdw IN-CORE

  • oracle_fdw

  • tds_fdw

  • mysql_fdw

  • odbc_fdw

  • sqlite_fdw

Types of FDW

RDBMS FDW

  • jdbc_fdw

  • firebird_fdw

  • informix_fdw

  • ..

  • postgres_fdw IN-CORE

  • oracle_fdw

  • tds_fdw

  • mysql_fdw

  • odbc_fdw

  • sqlite_fdw

Types of FDW

RDBMS FDW

  • jdbc_fdw

  • firebird_fdw

  • informix_fdw

  • ..

  • postgres_fdw IN-CORE

  • oracle_fdw

  • tds_fdw

  • mysql_fdw

  • odbc_fdw

  • sqlite_fdw

Types of FDW

RDBMS FDW

  • jdbc_fdw

  • firebird_fdw

  • informix_fdw

  • ..

  • postgres_fdw IN-CORE

  • oracle_fdw

  • tds_fdw

  • mysql_fdw

  • odbc_fdw

  • sqlite_fdw

Types of FDW

RDBMS FDW

  • jdbc_fdw

  • firebird_fdw

  • informix_fdw

  • ..

  • postgres_fdw IN-CORE

  • oracle_fdw

  • tds_fdw

  • mysql_fdw

  • odbc_fdw

  • sqlite_fdw

Types of FDW

  • mongo_fdw
  • hdfs_fdw
  • redis_fdw
  • wdb_fdw
  • quasar_fdw
  • ..

NoSQL FDW

Types of FDW

  • file_fdw IN-CORE
  • json_fdw
  • multicdr_fdw
  • dump_fdw
  • ..

File Wrappers

Types of FDW

  • ldap_fdw

  • blackhole_fdw

  • twitter_fdw

  • ..

Others..

Generate Your FDW with Multicorn

Multicorn

  • Generating FDW is easy with Multicorn.

  • Any source can be your dataset in Postgres.

  • Multicorn makes it possible to use your source as Foreign Table to query in Postgres.

Multicorn

  • FileSystem Foreign Data Wrapper (fsfdw)

    • RST files can be a source.

    • File pattern is important.

  • Imap Foreign Data Wrapper (imapfdw)

  • LDAP Foreign Data Wrapper (ldapfdw)

  • CSV Foreign Data Wrapper (csvfdw)

    • Use CSV files as a source for Foreign Tables.

  • RSS foreign data wrapper (rssfdw)

    • items: title, link, description, author, pubDate, source

Multicorn

  • Each FDW has its own class.

  • Use class names in CREATE SERVER.

    • For fsfdw:

      • Class: multicorn.fsfdw.FilesystemFdw

        Source code: multicorn/fsfdw/__init__.py

main_directory/

  2018/

        01/

              report1 - 2018-01-01.csv

              report2 - 2018-02-01.csv

              report3 - 2018-03-01.csv

              report4 - 2018-03-05.csv

        02/

              report1 - 2018-01-01.csv

              report2 - 2018-02-01.csv

  2017/

        01/

              report1 - 2017-01-01.csv

              report2 - 2017-02-01.csv

File Structure

An example for fsfdw

An example for fsfdw

main_directory/

  2018/

        01/

              report1 - 2018-01-01.csv

              report2 - 2018-02-01.csv

              report3 - 2018-03-01.csv

              report4 - 2018-03-05.csv

        02/

              report1 - 2018-01-01.csv

              report2 - 2018-02-01.csv

  2017/

        01/

              report1 - 2017-01-01.csv

              report2 - 2017-02-01.csv

File Structure

Year Month Date Report_name file_name
2018 01 2018-02-01 report report1 - 2018-01-01.csv

Foreign Table

-- First step: Create Server!

artemis=# CREATE SERVER filesystem_srv foreign data wrapper multicorn options (
         wrapper 'multicorn.fsfdw.FilesystemFdw');
CREATE SERVER

An example for fsfdw

-- Second step: Create Foreign Table!

artemis=# CREATE FOREIGN TABLE fs_reports (
f_year INTEGER,
f_month INTEGER,
f_date TEXT,
f_report_name TEXT,
f_file_name TEXT,
f_content TEXT) 

SERVER filesystem_srv OPTIONS(
    root_dir '/var/lib/pgsql/fsfdw_files/',
    pattern '{f_year}/{f_month}/{f_report_name} - {f_date}.csv',
    filename_column 'f_file_name',
    content_column 'f_content');

Options

  • root_dir
  • pattern
  • filename_column
  • content_column

An example for fsfdw

/var/lib/pgsql/fsfdw_files/2018/01/report1 - 2018-01-01.csv

root_dir

f_year

f_date

f_report_name

f_month

f_file_name

The Result:

artemis=# select * from fs_reports limit 1;
 f_year | f_month |   f_date   | f_report_name |           f_file_name            |            f_content             
--------+---------+------------+---------------+----------------------------------+--------------------------------
   2018 |       1 | 2018-02-01 | report1       | 2018/01/report1 - 2018-02-01.csv | This file doesn't have content!+
        |         |            |               |                                  | 
(1 row)

An example for fsfdw

Items: title, link, description, author, pubDate, source

Requirements

  1. Use proper class
  2. Find an RSS URL
  3. Decide RSS items
artemis=# CREATE SERVER rss_srv 
    foreign data wrapper multicorn 
    options (
    wrapper 'multicorn.rssfdw.RssFdw');

An example for rssfdw

artemis=# CREATE FOREIGN TABLE cnnrss (
        title character varying, 
        image text, 
        link text, 
        guid text, 
        pubDate text, 
        "pubDate" text
        ) server rss_srv options (
        url 'https://www.cnnturk.com/feed/rss/all/news');
CREATE FOREIGN TABLE
artemis=# CREATE SERVER rss_srv foreign data wrapper multicorn options (
        wrapper 'multicorn.rssfdw.RssFdw');
CREATE SERVER

An example for rssfdw

artemis=# select * from cnnrss offset 12 limit 1;
-[ RECORD 1 ]-----------------------------------------------------------------
title   | Banvit - MHP Riesen: 89 -76
image   | https://i.cnnturk.com/ps/cnnturk/75/720x490/5bbe45b3ae784917346b6c8d
link    | https://www.cnnturk.com/spor/basketbol/banvit-mhp-riesen-89-76
guid    | 5bbc4eaf61361f271c1ae628
pubdate | 
pubDate | Wed, 10 Oct 2018 18:57:00 GMT

An example for rssfdw

References

Thanks!

@gguncesi

guncekaya.blogspot.com

Made with Slides.com