Amazon EMR
Hands-On
Demo

In this demo, we will:
- Create an S3 bucket for input and output data
- Launch an EMR cluster
- Prepare and upload a Spark job
- Submit and run the Spark job on the EMR cluster
- Monitor job progress and view results
- Test the setup
- Clean up resources
Agenda

Demo Overview
Create S3 Bucket

emr-demo-401201
Create S3 bucket

Create folder
scripts
from pyspark.sql import SparkSession
from pyspark.sql.functions import sum, col
# Initialize Spark session
spark = SparkSession.builder.appName("EMR Demo Job").getOrCreate()
# Read input data from S3
input_path = "s3://emr-demo-401201/input/sample_data.csv"
df = spark.read.csv(input_path, header=True, inferSchema=True)
# Calculate sum of prices by category
result = df.groupBy("category") \
.agg(sum(col("price") * col("quantity")).alias("total_sales")) \
.orderBy(col("total_sales").desc())
# Write results back to S3
output_path = "s3://emr-demo-401201/output/results"
result.write.mode("overwrite").parquet(output_path)
# Show results (for demo purposes)
result.show()
# Stop the Spark session
spark.stop()
spark_job.py

Upload

Create folder
input
category,product,price,quantity,date
Electronics,Smartphone,599.99,2,2024-01-01
Home & Kitchen,Coffee Maker,79.99,1,2024-01-01
Clothing,T-Shirt,19.99,3,2024-01-01
Books,Science Fiction Novel,14.99,1,2024-01-02
Electronics,Laptop,1299.99,1,2024-01-02
Home & Kitchen,Blender,49.99,1,2024-01-02
Clothing,Jeans,59.99,2,2024-01-03
Sports & Outdoors,Yoga Mat,29.99,1,2024-01-03
Electronics,Wireless Earbuds,129.99,1,2024-01-03
Home & Kitchen,Toaster,34.99,1,2024-01-04
Clothing,Dress,89.99,1,2024-01-04
Books,Cookbook,24.99,2,2024-01-04
Electronics,Tablet,399.99,1,2024-01-05
Home & Kitchen,Air Fryer,99.99,1,2024-01-05
Clothing,Sweater,49.99,2,2024-01-05
Sports & Outdoors,Dumbbells Set,79.99,1,2024-01-06
Electronics,Smart Watch,199.99,1,2024-01-06
Home & Kitchen,Electric Kettle,39.99,1,2024-01-06
Clothing,Jacket,129.99,1,2024-01-07
Books,Mystery Novel,12.99,3,2024-01-07
Electronics,Digital Camera,549.99,1,2024-01-07
Home & Kitchen,Food Processor,89.99,1,2024-01-08
Clothing,Shorts,29.99,2,2024-01-08
Sports & Outdoors,Tennis Racket,119.99,1,2024-01-08
Electronics,Gaming Console,499.99,1,2024-01-09
Home & Kitchen,Microwave,149.99,1,2024-01-09
Clothing,Socks,9.99,5,2024-01-09
Books,Biography,19.99,2,2024-01-10
Electronics,Bluetooth Speaker,79.99,1,2024-01-10
Home & Kitchen,Slow Cooker,59.99,1,2024-01-10
Clothing,Scarf,24.99,2,2024-01-11
Sports & Outdoors,Bicycle,299.99,1,2024-01-11
Electronics,Monitor,249.99,1,2024-01-11
Home & Kitchen,Vacuum Cleaner,199.99,1,2024-01-12
Clothing,Gloves,14.99,3,2024-01-12
Books,Self-Help Book,16.99,2,2024-01-12
Electronics,Printer,159.99,1,2024-01-13
Home & Kitchen,Rice Cooker,49.99,1,2024-01-13
Clothing,Hat,19.99,2,2024-01-13
Sports & Outdoors,Camping Tent,199.99,1,2024-01-14
Electronics,External Hard Drive,89.99,1,2024-01-14
Home & Kitchen,Dish Set,69.99,1,2024-01-14
Clothing,Swimsuit,39.99,1,2024-01-15
Books,Travel Guide,22.99,1,2024-01-15
Electronics,Wireless Mouse,29.99,2,2024-01-15
Home & Kitchen,Cutlery Set,44.99,1,2024-01-16
Clothing,Belt,24.99,1,2024-01-16
Sports & Outdoors,Soccer Ball,19.99,2,2024-01-16
Electronics,Fitness Tracker,99.99,1,2024-01-17
Home & Kitchen,Stand Mixer,249.99,1,2024-01-17

Upload
EMR Demo Cluster

Create cluster
EMR-Demo-Cluster

Cluster configuration










EC2 instance profile for Amazon EMR



Wait 5-20 minutes - Check Status

EC2 Instances

EC2 Instance Dashboard

UI - 1

UI - 2
EC2 Instance Connect

Open Security Group Rule

Open Security Group Rule

hadoop

Add Step

My-Spark-App





Wait for it to be Completed

History Server


Check out the results in S3 Bucket

Clean Up

Delete EMR Cluster


Empty Bucket


Delete Bucket


Empty EMR Logs Bucket


Delete bucket

EMR Cluster Terminated

Delete Any EMR Roles

🙏
Thanks
for
Watching
Amazon EMR - Hands-On Demo
By Deepak Dubey
Amazon EMR - Hands-On Demo
Amazon EMR - Hands-On Demo
- 313