Custom Pentaho Plugins

You Are Here

Why Build Plugins?

You Are Here

  • Our default position is not to!

  • We needed functionality not offered by Pentaho

    • or... too complex to implement as N steps

  • Why not use a User Defined Java/JavaScript step?

    • Source Control

    • Reuse

      • Duplicate code -> More maintenance!

      • Can't publish as Open Source (for greater RoI)

    • Testing

      • Unit and Integration Tests... CI?

    • We do use a few very small User Defined JavaScript steps!

Starting a new Step Plugin

You Are Here

Anatomy of a Step Plugin

You Are Here

processRow is where your business
happens! It is equivalent to
User Defined Java/JavaScript Step

glues everything

StepData holds state from row-to-row
during the full transformation

Apache Jena Step Plugins

You Are Here

  1. Create a Jena Model (RDF Graph) per-row

    • Maps fields in a row into RDF Properties

  2. Combine Jena Models per-row

    • Merges one-or-more Jena Models within the same row

  3. Group and Merge Jena Models per-column

    • Merges Models from consequtive rows within the same column

  4. Serialize Jena Model per-column per-transformation

    • Merges all Jena Models (from a column), and writes a file

  5. SHACL Validation

Transforming Relational to RDF

You Are Here

  • Demo...

Synchronisation Step Plugins

  1. Compare and Set Atomic per-row

    • Conditionally initalise or CaS an Atomic Value

  2. Await Atomic per-row

    • Await for an Atomic Value and conditionally branch

  • Allows us to perform several steps as one Atomic Operation

    • Uses Java's Atomic values

    • Concurrency - Can be Tricky to get right!

    • Remember - Every Step in Pentaho is a distinct Thread!

  • Our Use Case - Get or Create (and Calculate) an Identifier

Synchronising Transformation Steps

You Are Here

  • Demo...

Enhancing Pentaho Itself

  • We chose Pentaho because it is Open Source

    • We have a mandate to evaluate Open Source first

  • Pentaho (like all software) has Issues!

  • We have contributed fixes for:

(not) Enhancing Pentaho Itself

  • Only two of our most minor fixes have been incorporated

  • In reality - Pentaho is only technically Open Source

    • There is no Open Source Community

    • Contributing to Pentaho is (almost) Impossible!

      • We have sent high quality code with tests and 100% test suite pass

      • Developers are difficult to reach

      • Pull-Requests (or issues) can go unanswered "For Ever"

      • Pull-Requests can be closed without a working solution

      • Opening JIRA Tickets doesn't result in progress

  • Hitachi Sales / Support

    • We would consider a contract... if we get the fixes we need!

Sharing is Caring!

  • We are currently maintaining our fork of Pentaho Kettle 9.1

    • Not Practical for us

    • Updating is tricky... 9.2 is out now

    • Have to maintain skilled staff, GitHub, CI, etc.

  • Not Sustainable for the Future... What are our options?

    • ...Would we choose Pentaho again?


Adam Retter
Director of Evolved Binary

(Consultant) Technical Architect for Project Omega,
                        The National Archives