Cloud Composer

Airflow as a Service

by Joaquín Menchaca

Exploring an Airflow + Spark + Kubernetes architecture and implementation


Extract Data

Identity Aware Proxy

Creating a Cluster

gcloud composer environments create demo-ephemeral-dataproc \
  --location us-central1 \
  --zone us-central1-b \
  --machine-type n1-standard-2 \
  --disk-size 20

Enabling service [] on project [348697763275]...
Operation "operations/acf.de22b478-5152-435f-8288-bde889dddac9" finished successfully.
Waiting for [projects/evenflow/locations/us-central1/environments/demo-ephemeral-dataproc] to be created with [projects/evenflow/locations/us-central1/operations/276589b0-579f-4377-830d-6271aea419d4]...done.


gcloud composer environments run demo-ephemeral-dataproc \
   --location=us-central1 variables -- \
   --set gcs_bucket $PROJECT

kubeconfig entry generated for us-central1-demo-ephemeral--276589b0-gke.
Executing within the following kubectl namespace: composer-1-7-2-airflow-1-9-0-276589b0
[2019-07-12 22:54:29,566] {} INFO - Generating grammar tables from /usr/lib/python2.7/lib2to3/Grammar.txt
[2019-07-12 22:54:29,588] {} INFO - Generating grammar tables from /usr/lib/python2.7/lib2to3/PatternGrammar.txt
[2019-07-12 22:54:29,832] {} INFO - Using executor CeleryExecutor
[2019-07-12 22:54:29,859] {} INFO - Reading the config from /etc/airflow/airflow.cfg
[2019-07-12 22:54:29,878] {} WARNING - Using default Composer Environment Variables. Overrides have not been applied.

Trigger the DAG

python \
 --url="https://$AIRFLOW_URL/api/experimental/dags/average-speed/dag_runs" \
 --iapClientId=$ \


    create_cluster >> submit_pyspark
    submit_pyspark >> [delete_cluster, bq_load]

    bq_load >> delete_transformed_files

    move_failed_files << [bq_load, submit_pyspark]

Cloud Composer

By Joaquín Menchaca

Cloud Composer

  • 683