PostgreSQL Foreign Data Wrappers
9 May 2019
Günce Kaya
guncekaya.blogspot.com
@gguncesi
Agenda
- What is SQL/MED?
- What is PostgreSQL Foreign Data Wrapper?
- History of FDW
- Why FDW?
- How to Use FDW?
- FDW Restrictions
- Importance of Push Down
- Types of FDWs
- Generate Your FDW with Multicorn
What is SQL/MED?
- SQL/MED was defined in 2003
- MED (Management of External Data)
- SQL/MED is SO/IEC 9075-9
- Access data which is an external source(SQL or non-SQL data) with standard SQL.
What is SQL/MED?
- Possible to connect to other RDBMS or Non-RDBMS
- Read and manage data in the file system
- Read non-SQL data (HTML, CSV, JSON etc.)
What is SQL/MED?
- Foreign Data Wrapper
- Foreign Server
- Foreign Table
- Foreign Data
What is SQL/MED?
There are two components:
- Wrapper Interface
- Datalink
- https://wiki.postgresql.org/wiki/DATALINK
- https://github.com/lacanoid/datalink
create table datalink_example (
id int,
datalink_col datalink
);
insert into datalink_example (datalink_col) select dlvalue('file:///tmp/test.txt');
What is SQL/MED?
There are two components:
- Wrapper Interface
-
Datalink
- https://wiki.postgresql.org/wiki/DATALINK
- https://github.com/lacanoid/datalink
create table datalink_example (
id int,
datalink_col datalink
);
insert into datalink_example (datalink_col) select dlvalue('file:///tmp/test.txt');
Components of the Wrapper Interface
ref: http://cs.unibo.it/~montesi/CBD/Articoli/SQL-MED.pdf
so, how SQL/MED works?
How SQL/MED works?
Local SQL-server has some few responsible;
- Decompose query into fragments.
- Connect Foreign Servers.
- Prepare the execution plan for each fragment.
- Start the execution for all plans.
- Receive result data
- Complete query execution
- Return the result to the application.
What is PostgreSQL Foreign Data Wrapper?
What is PostgreSQL Foreign Data Wrapper?
- FDW was introduced in PostgreSQL 9.1
- Access to foreign data from an external source with SQL
- Source can be;
- A relational or non-relational database
- CSV, XML, JSON etc.
- A file system's files
- An RSS's URL
- ...
- Use Foreign Server
- Retrieve external data via Foreign Table
- A foreign table is generated as a remote table's meta-table structure.
- A foreign table doesn't contain any data. Data is stored in the external table.
- Foreign Data is data stored in an external source.
What is PostgreSQL Foreign Data Wrapper?
History of FDW
8.4
9.1
9.2
- Implementation started
- file_fdw is added (in-core)
- Foreign table is introduced
- Informational commands for psql \des+ \det+ \dew+
9.3
- postgres_fdw is introduced (in-core)
- Push down for WHERE
9.4
- TRIGGER on Foreign Tables
9.5
- IMPORT SCHEMA was introduced for postgres_fdw with LIMIT clause
9.6
- Push down for JOIN, UPDATE, DELETE and ORDER BY statements
10
- push down aggregate functions
- file_fdw can read data from program output
11
- INSERT, UPDATE and COPY on foreign partitions (postgres_fdw)
- Push down aggregates to foreign tables that are partitions.
Why FDW?
- Postgres can be a bridge between different sources.
- Instant migration from other external sources to Postgres.
-
Access external source without duplicated or replaced data
-
FDW makes Postgres more flexible.
- It’s possible to use JOIN statement to a remote table with a table in internal server.
- Financial or DWH (Data Warehouse) queries require to join tables which are in different databases/clusters/servers.
Why FDW?
-
Clearer syntax for accessing remote tables
-
INSERT, UPDATE, DELETE statements on foreign tables (not all of fdw!)
- Privileges can be set on foreign tables as INSERT, UPDATE, DELETE.
Why FDW?
-
A foreign table has meta-table definition.
-
It helps to avoid running "SELECT * FROM table;"
-
-
A connection string only needs to be defined one time. FDW uses SERVER to do it.
-
Some FDWs support connection pooling.
Why FDW?
- It's possible to use with EXPLAIN command like: “Remote SQL” (not all of fdw provides)
artemis=# EXPLAIN (ANALYZE, VERBOSE) SELECT * FROM students LIMIT 1;
QUERY PLAN
--------------------------------------------------------------------------------
---------------------------------------------
Limit (cost=100.00..100.04 rows=1 width=36) (actual time=19.554..19.567 rows=1
loops=1)
Output: id, name
-> Foreign Scan on public.students (cost=100.00..150.95 rows=1365 width=36)
(actual time=19.544..19.549 rows=1 loops=1)
Output: id, name
Remote SQL: SELECT id, name FROM public.students
Planning time: 4.123 ms
Execution time: 275.614 ms
(7 rows)
Why FDW?
- Foreign tables can be a partition table of a partitioned table which is in the local server database. It's possible to use different foreign tables that are in different clusters.
-- Create partition table
artemis=# CREATE TABLE all_class (i INTEGER, random DOUBLE PRECISION)
PARTITION BY RANGE (i);
CREATE TABLE
-- Attach foreign tables to the partition table
artemis=# ALTER TABLE all_class ATTACH PARTITION pg_class
FOR VALUES from (1) TO (200);
ALTER TABLE
artemis=# ALTER TABLE all_class ATTACH PARTITION pg_class2
FOR VALUES from (200) TO (300);
ALTER TABLE
artemis=# SELECT * FROM all_class ;
i | random
-----+--------------------
200 | 0.794649593532085
1 | 0.979228006210178
-
Sharding capabilities with FDW
- Generate a partition table which has its partition tables stored in a different server
Why FDW?
PostgreSQL
PostgreSQL
PostgreSQL
PostgreSQL
PostgreSQL
PostgreSQL
-
Sharding capabilities with FDW
- Generate a partition table which has its partition tables stored in a different server
Why FDW?
PostgreSQL (postgres_fdw)
PostgreSQL
File System (file_fdw)
MySQL (mysql_fdw)
Why FDW?
artemis=# SELECT ft.foreign_table_name,
fs.foreign_server_name,
fs.foreign_data_wrapper_name
FROM information_schema.foreign_tables ft INNER JOIN information_schema.foreign_servers fs
ON fs.foreign_server_name = ft.foreign_server_name;
foreign_table_name | foreign_server_name | foreign_data_wrapper_name
--------------------+---------------------+---------------------------
file_class | csv | file_fdw
mysql_class | mysql_server | mysql_fdw
pg_class | pg_luna_server | postgres_fdw
(3 rows)
artemis=# \d+ all_class
Table "public.all_class"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------+------------------+-----------+----------+---------+---------+--------------+-------------
i | integer | | | | plain | |
random | double precision | | | | plain | |
Partition key: RANGE (i)
Partitions: file_class FOR VALUES FROM (600) TO (700),
mysql_class FOR VALUES FROM (500) TO (600),
pg_class FOR VALUES FROM (1) TO (500)
artemis=# SELECT a.i, a.random,
(CASE
WHEN EXISTS (SELECT 1 FROM file_class WHERE i=a.i) THEN 'file_class'
WHEN EXISTS (SELECT 1 FROM pg_class WHERE i=a.i) THEN 'pg_class'
ELSE 'mysql_class'
end) AS foreign_table_name
FROM all_class a ;
i | random | foreign_table_name
-----+--------------------+------------------------
13 | 0.979228006210178 | pg_class
500 | 2.8 | mysql_class
1 | 4.6643 | file_class
(3 rows)
-
Push down is for some clauses but not all of FDW!
- WHERE
- JOIN
- AGGREGATE functions
- UPDATE/INSERT/DELETE
- Sorting
- Column
- ...
Importance of Push Down
artemis=# EXPLAIN (ANALYZE, VERBOSE) SELECT AVG(id) FROM students WHERE id<10;
QUERY PLAN
--------------------------------------------------------------------------------
-----------------------------------------
Aggregate (cost=168.50..168.51 rows=1 width=32) (actual time=2.635..2.650 rows
=1 loops=1)
Output: avg(id)
-> Foreign Scan on public.students (cost=100.00..166.06 rows=975 width=4) (
actual time=2.477..2.547 rows=9 loops=1)
Output: id, name
Remote SQL: SELECT id FROM public.students WHERE ((id < 10))
Planning time: 0.240 ms
Execution time: 5.149 ms
(7 rows)
Aggregate Functions (PG 9.6)
Importance of Push Down
Aggregate Functions (PG 11)
artemis=# EXPLAIN (ANALYZE, VERBOSE) SELECT AVG(id) FROM students WHERE id<10;
QUERY PLAN
--------------------------------------------------------------------------------
---------------
Foreign Scan (cost=102.44..149.03 rows=1 width=32) (actual time=3.001..3.003 r
ows=1 loops=1)
Output: (avg(id))
Relations: Aggregate on (public.students)
Remote SQL: SELECT avg(id) FROM public.students WHERE ((id < 10))
Planning time: 29.511 ms
Execution time: 18.558 ms
(6 rows)
Importance of Push Down
How to FDW?
- CREATE EXTENSION
- CREATE SERVER
- CREATE USER MAPPING
- CREATE FOREIGN TABLE
CREATE FOREIGN TABLE <table name>
(col1 datatype, col2 datatype ...)
SERVER <server name>;
IMPORT FOREIGN SCHEMA <target schema>
from server <server name>
into <name of postgresql schema>;
CREATE EXTENSION postgres_fdw;
CREATE SERVER <server name>
FOREIGN DATA WRAPPER <fdw name>
OPTIONS(....);
CREATE SERVER test_server
FOREIGN DATA WRAPPER postgres_fdw
OPTIONS(host '192.168.111.1', port '5432');
CREATE USER MAPPING FOR <user>
server <server name>
options(....);
-- An example;
CREATE USER MAPPING FOR postgres
SERVER test_server
OPTIONS(user 'reporting', password 'secret');
1
2
4
3
FDW restrictions
- Meta-table structure changes are not transferred to local databases.
-
The data types of the columns of the remote tables and foreign tables should have a similar structure.
Types of FDW
RDBMS FDW
-
jdbc_fdw
-
firebird_fdw
-
informix_fdw
-
..
-
postgres_fdw IN-CORE
-
oracle_fdw
-
tds_fdw
-
mysql_fdw
-
odbc_fdw
-
sqlite_fdw
Types of FDW
RDBMS FDW
-
jdbc_fdw
-
firebird_fdw
-
informix_fdw
-
..
-
postgres_fdw IN-CORE
-
oracle_fdw
-
tds_fdw
-
mysql_fdw
-
odbc_fdw
-
sqlite_fdw
Types of FDW
RDBMS FDW
-
jdbc_fdw
-
firebird_fdw
-
informix_fdw
-
..
-
postgres_fdw IN-CORE
-
oracle_fdw
-
tds_fdw
-
mysql_fdw
-
odbc_fdw
-
sqlite_fdw
Types of FDW
RDBMS FDW
-
jdbc_fdw
-
firebird_fdw
-
informix_fdw
-
..
-
postgres_fdw IN-CORE
-
oracle_fdw
-
tds_fdw
-
mysql_fdw
-
odbc_fdw
-
sqlite_fdw
Types of FDW
RDBMS FDW
-
jdbc_fdw
-
firebird_fdw
-
informix_fdw
-
..
-
postgres_fdw IN-CORE
-
oracle_fdw
-
tds_fdw
-
mysql_fdw
-
odbc_fdw
-
sqlite_fdw
Types of FDW
- mongo_fdw
- hdfs_fdw
- redis_fdw
- wdb_fdw
- quasar_fdw
- ..
NoSQL FDW
Types of FDW
- file_fdw IN-CORE
- json_fdw
- multicdr_fdw
- dump_fdw
- ..
File Wrappers
Types of FDW
-
ldap_fdw
-
blackhole_fdw
-
twitter_fdw
-
..
Others..
Generate Your FDW with Multicorn
Multicorn
-
Generating FDW is easy with Multicorn.
-
Any source can be your dataset in Postgres.
-
Multicorn makes it possible to use your source as Foreign Table to query in Postgres.
Multicorn
-
FileSystem Foreign Data Wrapper (
fsfdw )-
RST files can be a source.
-
File pattern is important.
-
-
Imap Foreign Data Wrapper (
imapfdw ) -
LDAP Foreign Data Wrapper (
ldapfdw ) -
CSV Foreign Data Wrapper (
csvfdw )-
Use CSV files as a source for Foreign Tables.
-
-
RSS foreign data wrapper (
rssfdw )-
items: title, link, description, author, pubDate, source
-
Multicorn
-
Each FDW has its own class.
-
Use class names in CREATE SERVER.
-
For fsfdw:
-
Class: multicorn.fsfdw.FilesystemFdw
Source code: multicorn/fsfdw/__init__.py
-
-
main_directory/
2018/
01/
report1 - 2018-01-01.csv
report2 - 2018-02-01.csv
report3 - 2018-03-01.csv
report4 - 2018-03-05.csv
02/
report1 - 2018-01-01.csv
report2 - 2018-02-01.csv
2017/
01/
report1 - 2017-01-01.csv
report2 - 2017-02-01.csv
File Structure
An example for fsfdw
An example for fsfdw
main_directory/
2018/
01/
report1 - 2018-01-01.csv
report2 - 2018-02-01.csv
report3 - 2018-03-01.csv
report4 - 2018-03-05.csv
02/
report1 - 2018-01-01.csv
report2 - 2018-02-01.csv
2017/
01/
report1 - 2017-01-01.csv
report2 - 2017-02-01.csv
File Structure
Year | Month | Date | Report_name | file_name |
2018 | 01 | 2018-02-01 | report | report1 - 2018-01-01.csv |
Foreign Table
-- First step: Create Server!
artemis=# CREATE SERVER filesystem_srv foreign data wrapper multicorn options (
wrapper 'multicorn.fsfdw.FilesystemFdw');
CREATE SERVER
An example for fsfdw
-- Second step: Create Foreign Table!
artemis=# CREATE FOREIGN TABLE fs_reports (
f_year INTEGER,
f_month INTEGER,
f_date TEXT,
f_report_name TEXT,
f_file_name TEXT,
f_content TEXT)
SERVER filesystem_srv OPTIONS(
root_dir '/var/lib/pgsql/fsfdw_files/',
pattern '{f_year}/{f_month}/{f_report_name} - {f_date}.csv',
filename_column 'f_file_name',
content_column 'f_content');
Options
- root_dir
- pattern
- filename_column
- content_column
An example for fsfdw
/var/lib/pgsql/fsfdw_files/2018/01/report1 - 2018-01-01.csv
root_dir
f_year
f_date
f_report_name
f_month
f_file_name
The Result:
artemis=# select * from fs_reports limit 1;
f_year | f_month | f_date | f_report_name | f_file_name | f_content
--------+---------+------------+---------------+----------------------------------+--------------------------------
2018 | 1 | 2018-02-01 | report1 | 2018/01/report1 - 2018-02-01.csv | This file doesn't have content!+
| | | | |
(1 row)
An example for fsfdw
Items: title, link, description, author, pubDate, source
Requirements
- Use proper class
- Find an RSS URL
- Decide RSS items
artemis=# CREATE SERVER rss_srv
foreign data wrapper multicorn
options (
wrapper 'multicorn.rssfdw.RssFdw');
An example for rssfdw
artemis=# CREATE FOREIGN TABLE cnnrss (
title character varying,
image text,
link text,
guid text,
pubDate text,
"pubDate" text
) server rss_srv options (
url 'https://www.cnnturk.com/feed/rss/all/news');
CREATE FOREIGN TABLE
artemis=# CREATE SERVER rss_srv foreign data wrapper multicorn options (
wrapper 'multicorn.rssfdw.RssFdw');
CREATE SERVER
An example for rssfdw
artemis=# select * from cnnrss offset 12 limit 1;
-[ RECORD 1 ]-----------------------------------------------------------------
title | Banvit - MHP Riesen: 89 -76
image | https://i.cnnturk.com/ps/cnnturk/75/720x490/5bbe45b3ae784917346b6c8d
link | https://www.cnnturk.com/spor/basketbol/banvit-mhp-riesen-89-76
guid | 5bbc4eaf61361f271c1ae628
pubdate |
pubDate | Wed, 10 Oct 2018 18:57:00 GMT
An example for rssfdw
- https://www.percona.com/blog/2018/08/21/foreign-data-wrappers-postgresql-postgres_fdw/
- http://www.vertabelo.com/blog/technical-articles/foreign-data-wrappers-for-postgresql
- http://www.pgcon.org/2009/schedule/attachments/133_pgcon2009-sqlmed.pdf
- http://www.pgcon.org/2009/schedule/events/142.en.html
- https://wiki.postgresql.org/wiki/SqlMedConnectionManager
- https://docs.oracle.com/en/database/oracle/oracle-database/12.2/admin/sharding-overview.html#GUID-D90D0A33-1BCB-48D8-8193-8FD1C2959371
- https://www.wikizero.pro/index.php?q=aHR0cHM6Ly9lbi53aWtpcGVkaWEub3JnL3dpa2kvU2hhcmRfKGRhdGFiYXNlX2FyY2hpdGVjdHVyZSk
- http://rhaas.blogspot.com/2018/05/built-in-sharding-for-postgresql.html
- https://www.cybertec-postgresql.com/en/postgresql-distributed-aggregates-aggregate-pushdown/
- https://postgresrocks.enterprisedb.com/t5/Postgres-Gems/Planning-queries-involving-foreign-PostgreSQL-tables/ba-p/1512
- https://multicorn.readthedocs.io/en/latest/foreign-data-wrappers.html
References
Thanks!
@gguncesi
guncekaya.blogspot.com
Postgres - Foreign Data Wrappers
By Günce Kaya
Postgres - Foreign Data Wrappers
Postgres - Foreign Data Wrappers
- 1,633