Scaling Astroinformatics:

Python +Automatic Parallelization

Goal

a tool that allows scientists to perform their work in parallel with minimal changes to their code.

# variables
a = 5

# swapping requires no extra variable, yay!
b, c = 5, 6

# for loops (note similarity to matlab) 
for i in range(0, 10): 
    print(i ** 2) 

# defines a function
def foo():
    return "bar"

an interpreted, dynamically typed language 

commonly used to glue C libraries together

Problem

why isn't code parallel by default?

implicit state

Problem

race conditions

two actors modify a resource without coordination

Solution I

ownership

let v = vec![1, 2, 3];

let v2 = v;

println!("v[0] is: {}", v[0]);
error: use of moved value: `v`
println!("v[0] is: {}", v[0]);
                        ^

a name owns its value. another name can borrow this value (get a pointer), but not own it (and therefore not modify it).

Solution II

purity

// C
// read from stdin
getchar(char *r_val);

// print to stdout
printf(...);                 

# Python
# get a random value
random.randint()         

# get the current time
datetime.now()

impure

pure

// C

sqrt(25.0f);              

# Python

json.dumps({"key" : "value"})

a pure function depends only on its inputs to produce its output

Pure IO

-- constant space, linear time fibonacci
fibs = 0 : 1 : zipWith (+) fibs (tail fibs)

-- main in a pure function
main :: IO () 
main = do 
    print $ fibs !! 1000

monads! keep an eye out for these

Purity and parallelism

  • time is irrelevant
  • location is irrelevant
  • system is irrelevant

# magic inbound 
def analyze(exposures):
    x = [] # x is an empty list 
    for e in exposures:
        x.extend(measure(e)) # extend x with all elements of measure

    plot(x)

# more magic
def measure(exposure):
    ... # the ellipsis is valid syntax
    return list(...)

Pydron: an example


@pydron.schedule
def analyze(exposures):
    x = []
    for e in exposures:
        x.extend(measure(e))

    plot(x)

@pydron.functional
def measure(exposure):
    ... # the ellipsis is valid syntax

Pydron: an example

Parallelism gains

Conclusion

parallelism will eat the world