Spark.. Scala
Tools setup
- JDK
$ java -version
- sbt (Scala Build Tool)
$ sbt about
$ sbt compile
$ sbt run
- VS Code
Scala Syntax & Scala Metals extensions


Tools setup
- DataBricks Community Edition
- Jupyter Notebooks, run local
https://databricks.com/product/faq/community-edition
https://community.cloud.databricks.com
You will be getting emails!
- IntelliJ IDEA Community Edition
Scala plugin
Tools setup VS Code
Free Book
Concurrent work
- Split the data
- Concurrent work routine (threads)
- Combine back when done
- (Deal with errors)
one machine (or microservice)

Distributed work
- Split the data (or not) at Ingress
- Distribute workflow
- Produce desired results at Egress
- (Deal with errors)
microservices / ( serverless )

Distributed Data
- Data splitted over nodes
- Nodes operate on data shards
Concurrent - Spark nodes

Spark API
RDD
DataFrame
Dataset
Not typed
Typed
Low Level
Dataset[Row]
case class Something
Dataset[Something]
Catalyst optimizer
Spark Docs
DataFrame / Dataset
Functions
Spark.. Scala
spark-scala
By Cosmin P
spark-scala
spark scala resources
- 550