Spark on Yarn and Zeppelin on Docker

@jpizarrom

Title Text

  • Docker
  • Deis
  • Coreos
  • fleet
  • etcd

Yarn

Title Text

Title Text

Title Text

spark-defaults.conf


spark.driver.port   7001
spark.fileserver.port   7003
spark.broadcast.port    7004
spark.replClassServer.port  7005
spark.blockManager.port 7006
spark.executor.port 7007
spark.ui.port 4040
spark.broadcast.factory org.apache.spark.broadcast.HttpBroadcastFactory

Env

Flannel

Title Text

Interactive Shell

Zeppelin

  • A web-based notebook for interactive analytics
  • Deeply integrated with Spark and Hadoop
  • Supports multiple language backends
  • Incubating

Zeppelin

  • Apache Spark Integration
  • Supports scala, pyspark and spark sql
  • SparkContext injected automatically
  • Supports 3rd party dependencies
  • Spark-on-YARN and Spark standalone modes
  • Full Spark interpreter configuration
  • Multiple Spark interpreter profiles

Zeppelin

Referencias

  • github.com/jpizarrom/docker-spark
  • github.com/weaveworks/weave
  • github.com/coreos/flannel
  • www.slideshare.net/Hadoop_Summit/sparkonyarn-empower-spark-applications-on-hadoop-cluster
  • www.slideshare.net/vinnies12/data-science-with-spark-zeppelin

Spark on Yarn and Zeppelin on Docker

By Juan Pizarro

Spark on Yarn and Zeppelin on Docker

Spark Yarn and Zeppelin on Docker

  • 3,086