{Spark DataBricks}

April/5/2022

  • {Spark

  • DataBricks

  • Azure DataBricks

  • Spark Essentials

  • Demo

  • DataBricks}

  • {Load {Files}
  • Transform}
  • {RDD, DataFrame, DataSet
  • Data Sources
  • Transformations}
  • {Repository
  • CI/CD
  • Streaming
  • Performance}

{Spark}

  • Unified Engine

  • Large-scale Data

{Spark Docs}

  • https://spark.apache.org/docs/latest/sql-programming-guide.html

  • https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/Dataset.html

  • https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/functions$.html

{DataBricks}

https://databricks.com/

https://community.cloud.databricks.com/login.html

{Azure}

https://portal.azure.com

  • {RDD

  • DataFrame

  • Dataset}

{Spark Essentials}

{Data Sources}

{Files: text, json, csv, parquet,...}

{JDBC, Database}

{https://spark.apache.org/docs/latest/sql-data-sources.html}

{Transformations}

  • {DataFrames, Columns, Expressions,

  • Joins,

  • Aggregates}

{DataBricks Demo}

  • {Load {Files}

  • DataFrame

  • Transform}

{DataBricks}

  • {Code Repo

  • CI/CD

  • Streaming

  • Performance}

​                  ( Optimization & Tuning )

Spark DataBricks

By Cosmin P

Spark DataBricks

  • 330