Extend Your PostgreSQL: FDW Magic!

PGConf Europe Lisbon

25 October 2018

Günce Kaya

guncekaya.blogspot.com

@gguncesi

Agenda

  • What is SQL/MED?
  • What is Foreign Data Wrapper?
  • History of FDW
  • Why FDW?
  • How to Use FDW?
  • FDW Restrictions
  • Importance of Push Down
  • Types of FDWs
  • Generate Your FDW with Multicorn

What is SQL/MED?

  • SQL/MED was defined in 2003
  • MED (Management of External Data)
  • Access data which is an external source with an SQL statement.
  • SQL/MED supports using external data with two components.
    • Foreign Table
    • Datalink

What Is Foreign Data Wrapper?

  • FDW was introduced in PostgreSQL 9.1
  • Access to foreign data from an external source with SQL
  • Source can be;
    • A relational database
    • A NoSQL database
    • CSV, XML, JSON etc.
    • A file system's files
    • An RSS's URL
    • ...

What Is Foreign Data Wrapper?

  • Retrieve external data via  Foreign Table
    • A foreign table is generated as a remote table's meta-table structure.
    • A foreign table doesn't contain any data. Data is stored in the external table.
  • Foreign Data is data stored in an external source.

History of FDW

8.4

9.1

9.2

  • Implementation started
  • file_fdw is added (in-core)
  • Foreign table is introduced
  • Informational commands for psql \des+ \det+ \dew+

9.3

  • postgres_fdw is introduced (in-core)
  • Push down for WHERE

9.4

  • TRIGGER on Foreign Tables

9.5

  • IMPORT SCHEMA was introduced for postgres_fdw with LIMIT clause

9.6

  • Push down for JOIN, UPDATE, DELETE and ORDER BY statements

10

  • push down aggregate functions
  • file_fdw can read data from program output

11

  • INSERT, UPDATE and COPY on foreign partitions (postgres_fdw)
  • Push down aggregates to foreign tables that are partitions.

Why FDW?

  • Postgres can be a bridge between different sources.
  • Instant migration from other external sources to Postgres.
  • Access external source without duplicated or replaced data

  • FDW makes Postgres more flexible.
    • It’s possible to use JOIN statement to a remote table with a table in internal server.
    • Financial or DWH (Data Warehouse) queries require to join tables which are in different databases/clusters/servers.

Why FDW?

  • Clearer syntax for accessing remote tables

  • INSERT, UPDATE, DELETE statements on foreign tables (not all of them!)

  • Privileges can be set on foreign tables as INSERT, UPDATE, DELETE.
  • use_remote_estimate option for postgres_fdw

Why FDW?

  • Any data can be used, not only relational data.

  • Some FDWs support push down for particular clauses.

    • FDW executes queries in remote servers for particular clauses to reduce the amount of transferred data from external source. (WHERE pushdown, JOIN pushdown, etc.)

Why FDW?

  • A foreign table has meta-table definition.

    • It helps to avoid running "SELECT * FROM table;"

  • A connection string only needs to be defined one time. FDW uses SERVER to do it.

  • Some FDWs support connection pooling.

Why FDW?

  • It's possible to use with EXPLAIN command like: “Remote SQL” (not all of them)

artemis=# explain (analyze, verbose) select * from students limit 1;
                                                         QUERY PLAN             
                                             
--------------------------------------------------------------------------------
---------------------------------------------
 Limit  (cost=100.00..100.04 rows=1 width=36) (actual time=19.554..19.567 rows=1
 loops=1)
   Output: id, name
   ->  Foreign Scan on public.students  (cost=100.00..150.95 rows=1365 width=36)
 (actual time=19.544..19.549 rows=1 loops=1)
         Output: id, name
         Remote SQL: SELECT id, name FROM public.students
 Planning time: 4.123 ms
 Execution time: 275.614 ms
(7 rows)

Why FDW?

  • Foreign tables can be a partition table of a partitioned table which is in the local server database. It's possible to use different foreign tables that are in different clusters.
-- Create partition table
artemis=# CREATE TABLE all_class (i integer, random double precision) 
PARTITION BY RANGE (i);
CREATE TABLE

-- Attach foreign tables to the partition table
artemis=# ALTER TABLE all_class ATTACH PARTITION pg10_class 
FOR VALUES from (1) TO (200);
ALTER TABLE

artemis=# ALTER TABLE all_class ATTACH PARTITION pg10_class2 
FOR VALUES from (200) TO (300);
ALTER TABLE

artemis=# select * from all_class ;
  i  |       random       
-----+--------------------
 200 |  0.794649593532085
   1 |  0.979228006210178
  • Sharding capabilities with FDW
    • Generate a partition table which has its partition tables stored in a different server

Why FDW?

PostgreSQL

PostgreSQL

PostgreSQL

PostgreSQL

PostgreSQL

PostgreSQL

  • Sharding capabilities with FDW
    • Generate a partition table which has its partition tables stored in a different server

Why FDW?

PostgreSQL

PostgreSQL

file_fdw

MySQL

Why FDW?


artemis=# select ft.foreign_table_name, 
            fs.foreign_server_name, 
            fs.foreign_data_wrapper_name
from information_schema.foreign_tables ft inner join information_schema.foreign_servers fs 
on fs.foreign_server_name = ft.foreign_server_name;
 foreign_table_name | foreign_server_name | foreign_data_wrapper_name 
--------------------+---------------------+---------------------------
 file_class         | csv                 | file_fdw
 mysql_class        | mysql_server        | mysql_fdw
 pg10_class         | pg10_luna_server    | postgres_fdw

(3 rows)
artemis=# \d+ all_class 
                                     Table "public.all_class"
 Column |       Type       | Collation | Nullable | Default | Storage | Stats target | Description 
--------+------------------+-----------+----------+---------+---------+--------------+-------------
 i      | integer          |           |          |         | plain   |              | 
 random | double precision |           |          |         | plain   |              | 
Partition key: RANGE (i)
Partitions: file_class FOR VALUES FROM (600) TO (700),
            mysql_class FOR VALUES FROM (500) TO (600),
            pg10_class FOR VALUES FROM (1) TO (200)
artemis=# select a.i, a.random, 
    (case 
        when exists (select 1 from file_class where i=a.i) then 'file_class' 
        when exists (select 1 from pg10_class where i=a.i) then 'pg10_class' 
        else 'mysql_class' 
    end) as foreign_table_name 
from all_class a ;
  i  |       random       |   foreign_table_name   
-----+--------------------+------------------------
  13 |  0.979228006210178 | pg10_class
 500 |                2.8 | mysql_class
   1 |             4.6643 | file_class
(3 rows)
  • Push down is for some clauses but not all of FDW!
    • WHERE
    • JOIN
    • AGGREGATE functions
    • UPDATE/INSERT/DELETE
    • Sorting
    • Column
    • ...

Importance of Push Down

artemis=# explain (analyze, verbose) select avg(id) from students where id<10;
                                                       QUERY PLAN               
                                         
--------------------------------------------------------------------------------
-----------------------------------------
 Aggregate  (cost=168.50..168.51 rows=1 width=32) (actual time=2.635..2.650 rows
=1 loops=1)
   Output: avg(id)
   ->  Foreign Scan on public.students  (cost=100.00..166.06 rows=975 width=4) (
actual time=2.477..2.547 rows=9 loops=1)
         Output: id, name
         Remote SQL: SELECT id FROM public.students WHERE ((id < 10))
 Planning time: 0.240 ms
 Execution time: 5.149 ms
(7 rows)

Aggregate Functions (PG 9.6)

Importance of Push Down

Aggregate Functions (PG 10)

artemis=#  explain (analyze, verbose) select avg(id) from students where id<10;
                                          QUERY PLAN                            
               
--------------------------------------------------------------------------------
---------------
 Foreign Scan  (cost=102.44..149.03 rows=1 width=32) (actual time=3.001..3.003 r
ows=1 loops=1)
   Output: (avg(id))
   Relations: Aggregate on (public.students)
   Remote SQL: SELECT avg(id) FROM public.students WHERE ((id < 10))
 Planning time: 29.511 ms
 Execution time: 18.558 ms
(6 rows)

Importance of Push Down

Pushdown for JOIN statements and Sorting

artemis=# explain (analyze, verbose) 
select * from class c1 inner join class c2 on c1.class_name=c2.class_name 
order by c1.class_group limit 10;
                                                                               
                        QUERY PLAN                                             
                                                           
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
-----------------------------------------------------------
 Limit  (cost=100.00..106.23 rows=10 width=32) (actual time=2.730..2.739 rows=1
0 loops=1)
   Output: c1.class_name, c1.class_group, c2.class_name, c2.class_group, c1.cla
ss_group
   ->  Foreign Scan  (cost=100.00..16223.03 rows=25878 width=32) (actual time=2
.728..2.733 rows=10 loops=1)
         Output: c1.class_name, c1.class_group, c2.class_name, c2.class_group, 
c1.class_group
         Relations: (public.class c1) INNER JOIN (public.class c2)
         Remote SQL: SELECT r1.class_name, r1.class_group, r2.class_name, r2.cl
ass_group FROM (public.class r1 INNER JOIN public.class r2 ON (((r1.class_name 
= r2.class_name)))) ORDER BY r1.class_group ASC NULLS LAST
 Planning time: 0.309 ms
 Execution time: 4.753 ms
(8 rows)

Importance of Push Down

How to Use FDW?

  • CREATE EXTENSION
  • CREATE SERVER
  • CREATE USER MAPPING
  • CREATE FOREIGN TABLE
CREATE FOREIGN TABLE <table name> 
    (col1 datatype, col2 datatype ...) 
    SERVER <server name>;


IMPORT FOREIGN SCHEMA <target schema> 
from server <server name> 
into <name of postgresql schema>;
CREATE EXTENSION postgres_fdw;
CREATE SERVER <server name> 
FOREIGN DATA WRAPPER <fdw name> 
OPTIONS(....);


CREATE SERVER test_server
FOREIGN DATA WRAPPER postgres_fdw
OPTIONS(host '192.168.111.1', port '5432');
CREATE USER MAPPING FOR <user> 
server <server name> 
options(....);


-- An example;
CREATE USER MAPPING FOR postgres
SERVER test_server
OPTIONS(user 'reporting', password 'secret');

FDW Restrictions

  • Meta-table structure changes are not transferred to local databases.
  • The data types of the columns of the remote tables and foreign tables should have a similar structure.

Types of FDW

RDBMS FDW

  • jdbc_fdw

  • firebird_fdw

  • informix_fdw

  • ..

  • postgres_fdw IN-CORE

  • oracle_fdw

  • tds_fdw

  • mysql_fdw

  • odbc_fdw

  • sqlite_fdw

Types of FDW

RDBMS FDW

  • jdbc_fdw

  • firebird_fdw

  • informix_fdw

  • ..

  • postgres_fdw IN-CORE

  • oracle_fdw

  • tds_fdw

  • mysql_fdw

  • odbc_fdw

  • sqlite_fdw

Types of FDW

RDBMS FDW

  • jdbc_fdw

  • firebird_fdw

  • informix_fdw

  • ..

  • postgres_fdw IN-CORE

  • oracle_fdw

  • tds_fdw

  • mysql_fdw

  • odbc_fdw

  • sqlite_fdw

Types of FDW

RDBMS FDW

  • jdbc_fdw

  • firebird_fdw

  • informix_fdw

  • ..

  • postgres_fdw IN-CORE

  • oracle_fdw

  • tds_fdw

  • mysql_fdw

  • odbc_fdw

  • sqlite_fdw

Types of FDW

RDBMS FDW

  • jdbc_fdw

  • firebird_fdw

  • informix_fdw

  • ..

  • postgres_fdw IN-CORE

  • oracle_fdw

  • tds_fdw

  • mysql_fdw

  • odbc_fdw

  • sqlite_fdw

Types of FDW

  • mongo_fdw
  • hdfs_fdw
  • redis_fdw
  • wdb_fdw
  • quasar_fdw
  • ..

NoSQL FDW

Types of FDW

  • file_fdw IN-CORE
  • json_fdw
  • file_text_array_fdw
  • multicdr_fdw
  • dump_fdw
  • ..

File Wrappers

Types of FDW

  • ldap_fdw

  • blackhole_fdw

  • twitter_fdw

  • ..

Others..

Generate Your FDW with Multicorn

Multicorn

  • Generating FDW is easy with Multicorn.

  • Any source can be your dataset in Postgres.

  • Multicorn makes it possible to use your source as Foreign Table to query in Postgres.

Multicorn

  • FileSystem Foreign Data Wrapper (fsfdw)

    • RST files can be a source.

    • File pattern is important.

  • Imap Foreign Data Wrapper (imapfdw)

  • LDAP Foreign Data Wrapper (ldapfdw)

  • CSV Foreign Data Wrapper (csvfdw)

    • Use CSV files as a source for Foreign Tables.

  • RSS foreign data wrapper (rssfdw)

    • items: title, link, description, author, pubDate, source

Multicorn

  • Each FDW has its own class.

  • Use class names in CREATE SERVER.

    • For fsfdw:

      • Class: multicorn.fsfdw.FilesystemFdw

        Source code: multicorn/fsfdw/__init__.py

main_directory/

  2018/

        01/

              report1 - 2018-01-01.csv

              report2 - 2018-02-01.csv

              report3 - 2018-03-01.csv

              report4 - 2018-03-05.csv

        02/

              report1 - 2018-01-01.csv

              report2 - 2018-02-01.csv

  2017/

        01/

              report1 - 2017-01-01.csv

              report2 - 2017-02-01.csv

File Structure

An Example for fsfdw

An Example for fsfdw

main_directory/

  2018/

        01/

              report1 - 2018-01-01.csv

              report2 - 2018-02-01.csv

              report3 - 2018-03-01.csv

              report4 - 2018-03-05.csv

        02/

              report1 - 2018-01-01.csv

              report2 - 2018-02-01.csv

  2017/

        01/

              report1 - 2017-01-01.csv

              report2 - 2017-02-01.csv

File Structure

Year Month Date Report_name file_name
2018 01 2018-02-01 report report1 - 2018-01-01.csv

Foreign Table

-- First step: Create Server!

artemis=# CREATE SERVER filesystem_srv foreign data wrapper multicorn options (
         wrapper 'multicorn.fsfdw.FilesystemFdw');
CREATE SERVER

An Example for fsfdw

-- Second step: Create Foreign Table!

artemis=# CREATE FOREIGN TABLE fs_reports (
f_year INTEGER,
f_month INTEGER,
f_date TEXT,
f_report_name TEXT,
f_file_name TEXT,
f_content TEXT) 

SERVER filesystem_srv OPTIONS(
    root_dir '/var/lib/pgsql/fsfdw_files/',
    pattern '{f_year}/{f_month}/{f_report_name} - {f_date}.csv',
    filename_column 'f_file_name',
    content_column 'f_content');

Options

  • root_dir
  • pattern
  • filename_column
  • content_column

An Example for fsfdw

/var/lib/pgsql/fsfdw_files/2018/01/report1 - 2018-01-01.csv

root_dir

f_year

f_date

f_report_name

f_month

f_file_name

The Result:

artemis=# select * from fs_reports limit 1;
 f_year | f_month |   f_date   | f_report_name |           f_file_name            |            f_content             
--------+---------+------------+---------------+----------------------------------+--------------------------------
   2018 |       1 | 2018-02-01 | report1       | 2018/01/report1 - 2018-02-01.csv | This file doesn't have content!+
        |         |            |               |                                  | 
(1 row)

An Example for fsfdw

Items: title, link, description, author, pubDate, source

Requirements

  1. yum install python-lxml
  2. An RSS URL
  3. Decide RSS items
artemis=# CREATE SERVER rss_srv 
    foreign data wrapper multicorn 
    options (
    wrapper 'multicorn.rssfdw.RssFdw');
ERROR:  Error in python: ImportError
DETAIL:  No module named lxml

An Example for rssfdw

artemis=# CREATE FOREIGN TABLE cnnrss (
        title character varying, 
        image text, 
        link text, 
        guid text, 
        pubDate text, 
        "pubDate" text
        ) server rss_srv options (
        url 'https://www.cnnturk.com/feed/rss/all/news');
CREATE FOREIGN TABLE
artemis=# CREATE SERVER rss_srv foreign data wrapper multicorn options (
        wrapper 'multicorn.rssfdw.RssFdw');
CREATE SERVER

An Example for rssfdw

artemis=# select * from cnnrss offset 12 limit 1;
-[ RECORD 1 ]-----------------------------------------------------------------
title   | Banvit - MHP Riesen: 89 -76
image   | https://i.cnnturk.com/ps/cnnturk/75/720x490/5bbe45b3ae784917346b6c8d
link    | https://www.cnnturk.com/spor/basketbol/banvit-mhp-riesen-89-76
guid    | 5bbc4eaf61361f271c1ae628
pubdate | 
pubDate | Wed, 10 Oct 2018 18:57:00 GMT

An Example for rssfdw

References

Thanks!

@gguncesi

guncekaya.blogspot.com

Made with Slides.com