Scaling Astroinformatics:
Python +Automatic Parallelization
Goal
a tool that allows scientists to perform their work in parallel with minimal changes to their code.
# variables
a = 5
# swapping requires no extra variable, yay!
b, c = 5, 6
# for loops (note similarity to matlab)
for i in range(0, 10):
print(i ** 2)
# defines a function
def foo():
return "bar"
an interpreted, dynamically typed language
commonly used to glue C libraries together
Problem
why isn't code parallel by default?
implicit state
Problem
race conditions
two actors modify a resource without coordination
Solution I
ownership
let v = vec![1, 2, 3];
let v2 = v;
println!("v[0] is: {}", v[0]);
error: use of moved value: `v`
println!("v[0] is: {}", v[0]);
^
a name owns its value. another name can borrow this value (get a pointer), but not own it (and therefore not modify it).
Solution II
purity
// C
// read from stdin
getchar(char *r_val);
// print to stdout
printf(...);
# Python
# get a random value
random.randint()
# get the current time
datetime.now()
impure
pure
// C
sqrt(25.0f);
# Python
json.dumps({"key" : "value"})
a pure function depends only on its inputs to produce its output
Pure IO
-- constant space, linear time fibonacci
fibs = 0 : 1 : zipWith (+) fibs (tail fibs)
-- main in a pure function
main :: IO ()
main = do
print $ fibs !! 1000
monads! keep an eye out for these
Purity and parallelism
# magic inbound
def analyze(exposures):
x = [] # x is an empty list
for e in exposures:
x.extend(measure(e)) # extend x with all elements of measure
plot(x)
# more magic
def measure(exposure):
... # the ellipsis is valid syntax
return list(...)
Pydron: an example
@pydron.schedule
def analyze(exposures):
x = []
for e in exposures:
x.extend(measure(e))
plot(x)
@pydron.functional
def measure(exposure):
... # the ellipsis is valid syntax
Pydron: an example
Parallelism gains
Conclusion
parallelism will eat the world