Extend Your PostgreSQL: FDW Magic!
PGConf Europe Lisbon
25 October 2018
Günce Kaya
guncekaya.blogspot.com
@gguncesi
Agenda
- What is SQL/MED?
- What is Foreign Data Wrapper?
- History of FDW
- Why FDW?
- How to Use FDW?
- FDW Restrictions
- Importance of Push Down
- Types of FDWs
- Generate Your FDW with Multicorn
What is SQL/MED?
- SQL/MED was defined in 2003
- MED (Management of External Data)
- Access data which is an external source with an SQL statement.
- SQL/MED supports using external data with two components.
- Foreign Table
- Datalink
What Is Foreign Data Wrapper?
- FDW was introduced in PostgreSQL 9.1
- Access to foreign data from an external source with SQL
- Source can be;
- A relational database
- A NoSQL database
- CSV, XML, JSON etc.
- A file system's files
- An RSS's URL
- ...
What Is Foreign Data Wrapper?
- Retrieve external data via Foreign Table
- A foreign table is generated as a remote table's meta-table structure.
- A foreign table doesn't contain any data. Data is stored in the external table.
- Foreign Data is data stored in an external source.
History of FDW
8.4
9.1
9.2
- Implementation started
- file_fdw is added (in-core)
- Foreign table is introduced
- Informational commands for psql \des+ \det+ \dew+
9.3
- postgres_fdw is introduced (in-core)
- Push down for WHERE
9.4
- TRIGGER on Foreign Tables
9.5
- IMPORT SCHEMA was introduced for postgres_fdw with LIMIT clause
9.6
- Push down for JOIN, UPDATE, DELETE and ORDER BY statements
10
- push down aggregate functions
- file_fdw can read data from program output
11
- INSERT, UPDATE and COPY on foreign partitions (postgres_fdw)
- Push down aggregates to foreign tables that are partitions.
Why FDW?
- Postgres can be a bridge between different sources.
- Instant migration from other external sources to Postgres.
-
Access external source without duplicated or replaced data
-
FDW makes Postgres more flexible.
- It’s possible to use JOIN statement to a remote table with a table in internal server.
- Financial or DWH (Data Warehouse) queries require to join tables which are in different databases/clusters/servers.
Why FDW?
-
Clearer syntax for accessing remote tables
-
INSERT, UPDATE, DELETE statements on foreign tables (not all of them!)
- Privileges can be set on foreign tables as INSERT, UPDATE, DELETE.
- use_remote_estimate option for postgres_fdw
Why FDW?
-
Any data can be used, not only relational data.
-
Some FDWs support push down for particular clauses.
-
FDW executes queries in remote servers for particular clauses to reduce the amount of transferred data from external source. (WHERE pushdown, JOIN pushdown, etc.)
-
Why FDW?
-
A foreign table has meta-table definition.
-
It helps to avoid running "SELECT * FROM table;"
-
-
A connection string only needs to be defined one time. FDW uses SERVER to do it.
-
Some FDWs support connection pooling.
Why FDW?
- It's possible to use with EXPLAIN command like: “Remote SQL” (not all of them)
artemis=# explain (analyze, verbose) select * from students limit 1;
QUERY PLAN
--------------------------------------------------------------------------------
---------------------------------------------
Limit (cost=100.00..100.04 rows=1 width=36) (actual time=19.554..19.567 rows=1
loops=1)
Output: id, name
-> Foreign Scan on public.students (cost=100.00..150.95 rows=1365 width=36)
(actual time=19.544..19.549 rows=1 loops=1)
Output: id, name
Remote SQL: SELECT id, name FROM public.students
Planning time: 4.123 ms
Execution time: 275.614 ms
(7 rows)
Why FDW?
- Foreign tables can be a partition table of a partitioned table which is in the local server database. It's possible to use different foreign tables that are in different clusters.
-- Create partition table
artemis=# CREATE TABLE all_class (i integer, random double precision)
PARTITION BY RANGE (i);
CREATE TABLE
-- Attach foreign tables to the partition table
artemis=# ALTER TABLE all_class ATTACH PARTITION pg10_class
FOR VALUES from (1) TO (200);
ALTER TABLE
artemis=# ALTER TABLE all_class ATTACH PARTITION pg10_class2
FOR VALUES from (200) TO (300);
ALTER TABLE
artemis=# select * from all_class ;
i | random
-----+--------------------
200 | 0.794649593532085
1 | 0.979228006210178
-
Sharding capabilities with FDW
- Generate a partition table which has its partition tables stored in a different server
Why FDW?
PostgreSQL
PostgreSQL
PostgreSQL
PostgreSQL
PostgreSQL
PostgreSQL
-
Sharding capabilities with FDW
- Generate a partition table which has its partition tables stored in a different server
Why FDW?
PostgreSQL
PostgreSQL
file_fdw
MySQL
Why FDW?
artemis=# select ft.foreign_table_name,
fs.foreign_server_name,
fs.foreign_data_wrapper_name
from information_schema.foreign_tables ft inner join information_schema.foreign_servers fs
on fs.foreign_server_name = ft.foreign_server_name;
foreign_table_name | foreign_server_name | foreign_data_wrapper_name
--------------------+---------------------+---------------------------
file_class | csv | file_fdw
mysql_class | mysql_server | mysql_fdw
pg10_class | pg10_luna_server | postgres_fdw
(3 rows)
artemis=# \d+ all_class
Table "public.all_class"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------+------------------+-----------+----------+---------+---------+--------------+-------------
i | integer | | | | plain | |
random | double precision | | | | plain | |
Partition key: RANGE (i)
Partitions: file_class FOR VALUES FROM (600) TO (700),
mysql_class FOR VALUES FROM (500) TO (600),
pg10_class FOR VALUES FROM (1) TO (200)
artemis=# select a.i, a.random,
(case
when exists (select 1 from file_class where i=a.i) then 'file_class'
when exists (select 1 from pg10_class where i=a.i) then 'pg10_class'
else 'mysql_class'
end) as foreign_table_name
from all_class a ;
i | random | foreign_table_name
-----+--------------------+------------------------
13 | 0.979228006210178 | pg10_class
500 | 2.8 | mysql_class
1 | 4.6643 | file_class
(3 rows)
-
Push down is for some clauses but not all of FDW!
- WHERE
- JOIN
- AGGREGATE functions
- UPDATE/INSERT/DELETE
- Sorting
- Column
- ...
Importance of Push Down
artemis=# explain (analyze, verbose) select avg(id) from students where id<10;
QUERY PLAN
--------------------------------------------------------------------------------
-----------------------------------------
Aggregate (cost=168.50..168.51 rows=1 width=32) (actual time=2.635..2.650 rows
=1 loops=1)
Output: avg(id)
-> Foreign Scan on public.students (cost=100.00..166.06 rows=975 width=4) (
actual time=2.477..2.547 rows=9 loops=1)
Output: id, name
Remote SQL: SELECT id FROM public.students WHERE ((id < 10))
Planning time: 0.240 ms
Execution time: 5.149 ms
(7 rows)
Aggregate Functions (PG 9.6)
Importance of Push Down
Aggregate Functions (PG 10)
artemis=# explain (analyze, verbose) select avg(id) from students where id<10;
QUERY PLAN
--------------------------------------------------------------------------------
---------------
Foreign Scan (cost=102.44..149.03 rows=1 width=32) (actual time=3.001..3.003 r
ows=1 loops=1)
Output: (avg(id))
Relations: Aggregate on (public.students)
Remote SQL: SELECT avg(id) FROM public.students WHERE ((id < 10))
Planning time: 29.511 ms
Execution time: 18.558 ms
(6 rows)
Importance of Push Down
Pushdown for JOIN statements and Sorting
artemis=# explain (analyze, verbose)
select * from class c1 inner join class c2 on c1.class_name=c2.class_name
order by c1.class_group limit 10;
QUERY PLAN
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
-----------------------------------------------------------
Limit (cost=100.00..106.23 rows=10 width=32) (actual time=2.730..2.739 rows=1
0 loops=1)
Output: c1.class_name, c1.class_group, c2.class_name, c2.class_group, c1.cla
ss_group
-> Foreign Scan (cost=100.00..16223.03 rows=25878 width=32) (actual time=2
.728..2.733 rows=10 loops=1)
Output: c1.class_name, c1.class_group, c2.class_name, c2.class_group,
c1.class_group
Relations: (public.class c1) INNER JOIN (public.class c2)
Remote SQL: SELECT r1.class_name, r1.class_group, r2.class_name, r2.cl
ass_group FROM (public.class r1 INNER JOIN public.class r2 ON (((r1.class_name
= r2.class_name)))) ORDER BY r1.class_group ASC NULLS LAST
Planning time: 0.309 ms
Execution time: 4.753 ms
(8 rows)
Importance of Push Down
How to Use FDW?
- CREATE EXTENSION
- CREATE SERVER
- CREATE USER MAPPING
- CREATE FOREIGN TABLE
CREATE FOREIGN TABLE <table name>
(col1 datatype, col2 datatype ...)
SERVER <server name>;
IMPORT FOREIGN SCHEMA <target schema>
from server <server name>
into <name of postgresql schema>;
CREATE EXTENSION postgres_fdw;
CREATE SERVER <server name>
FOREIGN DATA WRAPPER <fdw name>
OPTIONS(....);
CREATE SERVER test_server
FOREIGN DATA WRAPPER postgres_fdw
OPTIONS(host '192.168.111.1', port '5432');
CREATE USER MAPPING FOR <user>
server <server name>
options(....);
-- An example;
CREATE USER MAPPING FOR postgres
SERVER test_server
OPTIONS(user 'reporting', password 'secret');
FDW Restrictions
- Meta-table structure changes are not transferred to local databases.
-
The data types of the columns of the remote tables and foreign tables should have a similar structure.
Types of FDW
RDBMS FDW
-
jdbc_fdw
-
firebird_fdw
-
informix_fdw
-
..
-
postgres_fdw IN-CORE
-
oracle_fdw
-
tds_fdw
-
mysql_fdw
-
odbc_fdw
-
sqlite_fdw
Types of FDW
RDBMS FDW
-
jdbc_fdw
-
firebird_fdw
-
informix_fdw
-
..
-
postgres_fdw IN-CORE
-
oracle_fdw
-
tds_fdw
-
mysql_fdw
-
odbc_fdw
-
sqlite_fdw
Types of FDW
RDBMS FDW
-
jdbc_fdw
-
firebird_fdw
-
informix_fdw
-
..
-
postgres_fdw IN-CORE
-
oracle_fdw
-
tds_fdw
-
mysql_fdw
-
odbc_fdw
-
sqlite_fdw
Types of FDW
RDBMS FDW
-
jdbc_fdw
-
firebird_fdw
-
informix_fdw
-
..
-
postgres_fdw IN-CORE
-
oracle_fdw
-
tds_fdw
-
mysql_fdw
-
odbc_fdw
-
sqlite_fdw
Types of FDW
RDBMS FDW
-
jdbc_fdw
-
firebird_fdw
-
informix_fdw
-
..
-
postgres_fdw IN-CORE
-
oracle_fdw
-
tds_fdw
-
mysql_fdw
-
odbc_fdw
-
sqlite_fdw
Types of FDW
- mongo_fdw
- hdfs_fdw
- redis_fdw
- wdb_fdw
- quasar_fdw
- ..
NoSQL FDW
Types of FDW
- file_fdw IN-CORE
- json_fdw
- file_text_array_fdw
- multicdr_fdw
- dump_fdw
- ..
File Wrappers
Types of FDW
-
ldap_fdw
-
blackhole_fdw
-
twitter_fdw
-
..
Others..
Generate Your FDW with Multicorn
Multicorn
-
Generating FDW is easy with Multicorn.
-
Any source can be your dataset in Postgres.
-
Multicorn makes it possible to use your source as Foreign Table to query in Postgres.
Multicorn
-
FileSystem Foreign Data Wrapper (
fsfdw )-
RST files can be a source.
-
File pattern is important.
-
-
Imap Foreign Data Wrapper (
imapfdw ) -
LDAP Foreign Data Wrapper (
ldapfdw ) -
CSV Foreign Data Wrapper (
csvfdw )-
Use CSV files as a source for Foreign Tables.
-
-
RSS foreign data wrapper (
rssfdw )-
items: title, link, description, author, pubDate, source
-
Multicorn
-
Each FDW has its own class.
-
Use class names in CREATE SERVER.
-
For fsfdw:
-
Class: multicorn.fsfdw.FilesystemFdw
Source code: multicorn/fsfdw/__init__.py
-
-
main_directory/
2018/
01/
report1 - 2018-01-01.csv
report2 - 2018-02-01.csv
report3 - 2018-03-01.csv
report4 - 2018-03-05.csv
02/
report1 - 2018-01-01.csv
report2 - 2018-02-01.csv
2017/
01/
report1 - 2017-01-01.csv
report2 - 2017-02-01.csv
File Structure
An Example for fsfdw
An Example for fsfdw
main_directory/
2018/
01/
report1 - 2018-01-01.csv
report2 - 2018-02-01.csv
report3 - 2018-03-01.csv
report4 - 2018-03-05.csv
02/
report1 - 2018-01-01.csv
report2 - 2018-02-01.csv
2017/
01/
report1 - 2017-01-01.csv
report2 - 2017-02-01.csv
File Structure
Year | Month | Date | Report_name | file_name |
2018 | 01 | 2018-02-01 | report | report1 - 2018-01-01.csv |
Foreign Table
-- First step: Create Server!
artemis=# CREATE SERVER filesystem_srv foreign data wrapper multicorn options (
wrapper 'multicorn.fsfdw.FilesystemFdw');
CREATE SERVER
An Example for fsfdw
-- Second step: Create Foreign Table!
artemis=# CREATE FOREIGN TABLE fs_reports (
f_year INTEGER,
f_month INTEGER,
f_date TEXT,
f_report_name TEXT,
f_file_name TEXT,
f_content TEXT)
SERVER filesystem_srv OPTIONS(
root_dir '/var/lib/pgsql/fsfdw_files/',
pattern '{f_year}/{f_month}/{f_report_name} - {f_date}.csv',
filename_column 'f_file_name',
content_column 'f_content');
Options
- root_dir
- pattern
- filename_column
- content_column
An Example for fsfdw
/var/lib/pgsql/fsfdw_files/2018/01/report1 - 2018-01-01.csv
root_dir
f_year
f_date
f_report_name
f_month
f_file_name
The Result:
artemis=# select * from fs_reports limit 1;
f_year | f_month | f_date | f_report_name | f_file_name | f_content
--------+---------+------------+---------------+----------------------------------+--------------------------------
2018 | 1 | 2018-02-01 | report1 | 2018/01/report1 - 2018-02-01.csv | This file doesn't have content!+
| | | | |
(1 row)
An Example for fsfdw
Items: title, link, description, author, pubDate, source
Requirements
- yum install python-lxml
- An RSS URL
- Decide RSS items
artemis=# CREATE SERVER rss_srv
foreign data wrapper multicorn
options (
wrapper 'multicorn.rssfdw.RssFdw');
ERROR: Error in python: ImportError
DETAIL: No module named lxml
An Example for rssfdw
artemis=# CREATE FOREIGN TABLE cnnrss (
title character varying,
image text,
link text,
guid text,
pubDate text,
"pubDate" text
) server rss_srv options (
url 'https://www.cnnturk.com/feed/rss/all/news');
CREATE FOREIGN TABLE
artemis=# CREATE SERVER rss_srv foreign data wrapper multicorn options (
wrapper 'multicorn.rssfdw.RssFdw');
CREATE SERVER
An Example for rssfdw
artemis=# select * from cnnrss offset 12 limit 1;
-[ RECORD 1 ]-----------------------------------------------------------------
title | Banvit - MHP Riesen: 89 -76
image | https://i.cnnturk.com/ps/cnnturk/75/720x490/5bbe45b3ae784917346b6c8d
link | https://www.cnnturk.com/spor/basketbol/banvit-mhp-riesen-89-76
guid | 5bbc4eaf61361f271c1ae628
pubdate |
pubDate | Wed, 10 Oct 2018 18:57:00 GMT
An Example for rssfdw
- https://www.percona.com/blog/2018/08/21/foreign-data-wrappers-postgresql-postgres_fdw/
- http://www.vertabelo.com/blog/technical-articles/foreign-data-wrappers-for-postgresql
- http://www.pgcon.org/2009/schedule/attachments/133_pgcon2009-sqlmed.pdf
- http://www.pgcon.org/2009/schedule/events/142.en.html
- https://wiki.postgresql.org/wiki/SqlMedConnectionManager
- https://docs.oracle.com/en/database/oracle/oracle-database/12.2/admin/sharding-overview.html#GUID-D90D0A33-1BCB-48D8-8193-8FD1C2959371
- https://www.wikizero.pro/index.php?q=aHR0cHM6Ly9lbi53aWtpcGVkaWEub3JnL3dpa2kvU2hhcmRfKGRhdGFiYXNlX2FyY2hpdGVjdHVyZSk
- http://rhaas.blogspot.com/2018/05/built-in-sharding-for-postgresql.html
- https://www.cybertec-postgresql.com/en/postgresql-distributed-aggregates-aggregate-pushdown/
- https://postgresrocks.enterprisedb.com/t5/Postgres-Gems/Planning-queries-involving-foreign-PostgreSQL-tables/ba-p/1512
- https://multicorn.readthedocs.io/en/latest/foreign-data-wrappers.html
References
Thanks!
@gguncesi
guncekaya.blogspot.com
PGConf Europe Lisbon 2018 - FDW Magic
By Günce Kaya
PGConf Europe Lisbon 2018 - FDW Magic
PGConf Europe Lisbon 2018 - FDW Magic
- 2,186