Intro to Apache Madlib
Jowanza Joseph
@jowanza
Extensions
What can you extend?
- Type System and Operators
- User-defined functions and aggregates
- Storage system and indexes
- Write-ahead logging and replication
- Transaction Engine
- Background worker process
- Query planner and the query executor
- Configuration and database metadata
UDF
UDAF
Example
# base36 extension
comment = 'base36 datatype'
default_version = '0.0.1'
relocatable = true
-- complain if script is sourced in psql, rather than via CREATE EXTENSION
\echo Use "CREATE EXTENSION base36" to load this file. \quit
CREATE FUNCTION base36_encode(digits int)
RETURNS text
LANGUAGE plpgsql IMMUTABLE STRICT
AS $$
DECLARE
chars char[];
ret varchar;
val int;
BEGIN
chars := ARRAY[
'0','1','2','3','4','5','6','7','8','9','a','b','c','d','e','f','g','h',
'i','j','k','l','m','n','o','p','q','r','s','t', 'u','v','w','x','y','z'
];
val := digits;
ret := '';
WHILE val != 0 LOOP
ret := chars[(val % 36)+1] || ret;
val := val / 36;
END LOOP;
RETURN(ret);
END;
$$;
EXTENSION = base36 # the extensions name
DATA = base36--0.0.1.sql # script files to install
# postgres build stuff
PG_CONFIG = pg_config
PGXS := $(shell $(PG_CONFIG) --pgxs)
include $(PGXS)
SELECT madlib.logregr_train(
'patients', -- source table
'patients_logregr', -- output table
'second_attack', -- labels
'ARRAY[1, treatment, trait_anxiety]', -- features
NULL, -- grouping columns
20, -- max number of iteration
'irls' -- optimizer
);
Demo
Resources
- http://madlib.apache.org
- Blog post coming soon
- Jupyter Notebooks
Thanks
Intro to Apache Madlib
By Jowanza Joseph
Intro to Apache Madlib
- 916