In local installation, cores serve as master & slaves
Recall Spatial
organization
A stage ends
when the RDD needs
to be materialized
.cache() same as .persist(MEMORY_ONLY)
http://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence
http://takwatanabe.me/pyspark/generated/generated/pyspark.RDD.checkpoint.html