Intro to Apache Madlib

Jowanza Joseph

@jowanza

Extensions

What can you extend?

  • Type System and Operators
  • User-defined functions and aggregates
  • Storage system and indexes
  • Write-ahead logging and replication
  • Transaction Engine
  • Background worker process
  • Query planner and the query executor
  • Configuration and database metadata

UDF

UDAF

Example

# base36 extension
comment = 'base36 datatype'
default_version = '0.0.1'
relocatable = true

-- complain if script is sourced in psql, rather than via CREATE EXTENSION
\echo Use "CREATE EXTENSION base36" to load this file. \quit
CREATE FUNCTION base36_encode(digits int)
RETURNS text
LANGUAGE plpgsql IMMUTABLE STRICT
  AS $$
    DECLARE
      chars char[];
      ret varchar;
      val int;
    BEGIN
      chars := ARRAY[
                '0','1','2','3','4','5','6','7','8','9','a','b','c','d','e','f','g','h',
                'i','j','k','l','m','n','o','p','q','r','s','t', 'u','v','w','x','y','z'
              ];

      val := digits;
      ret := '';

    WHILE val != 0 LOOP
      ret := chars[(val % 36)+1] || ret;
      val := val / 36;
    END LOOP;

    RETURN(ret);
    END;
  $$;


EXTENSION = base36        # the extensions name
DATA = base36--0.0.1.sql  # script files to install

# postgres build stuff
PG_CONFIG = pg_config
PGXS := $(shell $(PG_CONFIG) --pgxs)
include $(PGXS)
SELECT madlib.logregr_train(
    'patients',                                 -- source table
    'patients_logregr',                         -- output table
    'second_attack',                            -- labels
    'ARRAY[1, treatment, trait_anxiety]',       -- features
    NULL,                                       -- grouping columns
    20,                                         -- max number of iteration
    'irls'                                      -- optimizer
    );

Demo

Resources

  • http://madlib.apache.org
  • Blog post coming soon
  • Jupyter Notebooks

Thanks

Intro to Apache Madlib

By Jowanza Joseph

Intro to Apache Madlib

  • 916