Multi-Dimensional Climate Applications

George Kierstein

Challenges and Solutions

Common Approaches

One Dimensional Solutions

Vertical Stack

Single-Use

'Reports' not applications

Incommensurable Data

(Multi-Generational Data Sets)

 

Period of record longer than any one person's career

Format changes over time

Custom formats that are typically poorly documented.

Relationships between data sets opaque

Alternative Approach

Climate Applications will need to leverage:

 

 

Distributed architectures designed for end-user applications

Modern Data Visualization best practices

Multiple data sets

Modern Toolchains and languages

Multi-dimensional Climate Data visualization using Clojure/Clojurescript

What is Clojure?

(Don't worry this won't hurt a bit)

It's a pure functional LISP that runs on the JVM

'Pure Functional Language' ?

  • Basically it's just Math

- Lambda Calculus, Alonzo Church 1936

G( F(X) )

So What?

Garbage collection was invented by John McCarthy around 1959 to abstract away manual memory management in Lisp

The REPL was created by a company called Lisp Machines in the 70s

It's Expressive

Code Survivability

The Imperative Model is breaking now more than ever:

  • Even an older-model laptop has multiple cores, memory caches, optimization strategies, etc
  • Distributed computing is a hard break
  • Quantum computing is based on *photons* and right around the corner
  • we can't stop ourselves

Code Survivability

LISPS

  • Foundations in Mathematics
  • Even if the JVM went poof

LISP's are an excellent fit for scientific computing and, perhaps, best fitted for generational-scale code survivability

The Data

CRN

SWDI - Hail

One Library of Note:

Mathbox

Steven Wittens

http://acko.net

(Screenshot)

Mathbox Usage

(Screenshot)

(Screenshot)

Dynamic Subsetting

Finding correlated subsets can be challenging

Typically done by custom code

Correlations calculated by hand

Adding new dimensions time-consuming and require domain expert

Testing and maintenance in production challenging

System: Distributed Logic Subsetting Engine

Declarative

Data set relationships described *once*

Logic-Solver finds the data you want on demand.

Batch/Stream engine fits modern end-user application pipelines
(Lambda Architectures)

(Advantages)

System: Distributed Logic Subsetting Engine

Clojure application using:

Onyx  (declarative stream/batch pipeline)

Custom logic-solver as plugin

Proprietary equivalence engine for correlations between data sets

(Implementation)

Many Thanks!

GST Big Data Presentation

By gatewayspectacle

GST Big Data Presentation

  • 288