AWS Data Exchange
Hands-On
Demo

In this demo, we will:
- Explore the AWS Data Exchange catalog
- See how to Subscribe to a dataset
- Check out the Covid Data
- Set up S3 bucket for data storage
- Download and save data from AWS Data Exchange to S3
- Access and analyze the data
- Clean up resources
Agenda

AWS Data Exchange





COVID-19 Data Lake

Resources on AWS

Usage examples

S3 Bucket

Download Under csv/ folder

Enigma-JHU.csv.gz
head -n 1000 Enigma-JHU.csv > Enigma-JHU-1000.csv
fips,admin2,province_state,country_region,last_update,latitude,longitude,confirmed,deaths,recovered,active,combined_key
,,Anhui,China,2020-01-22T17:00:00,31.826,117.226,1,,,,"Anhui, China"
,,Beijing,China,2020-01-22T17:00:00,40.182,116.414,14,,,,"Beijing, China"
,,Chongqing,China,2020-01-22T17:00:00,30.057,107.874,6,,,,"Chongqing, China"
,,Fujian,China,2020-01-22T17:00:00,26.079,117.987,1,,,,"Fujian, China"
,,Gansu,China,2020-01-22T17:00:00,36.061,103.834,,,,,"Gansu, China"
code Enigma-JHU-1000.csv

Create S3 bucket
covid19-data-lake-demo-917212

Upload CSV File

CREATE EXTERNAL TABLE covid_data (
fips STRING,
admin2 STRING,
province_state STRING,
country_region STRING,
last_update STRING,
latitude STRING,
longitude STRING,
confirmed STRING,
deaths STRING,
recovered STRING,
active STRING,
combined_key STRING
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
"separatorChar" = ",",
"quoteChar" = "\"",
"escapeChar" = "\\"
)
STORED AS TEXTFILE
LOCATION 's3://covid19-data-lake-demo-917212/'
TBLPROPERTIES ('skip.header.line.count'='1');
Create Table in Athena


SELECT * FROM covid_data LIMIT 10;
Verify Table Data

Results

SELECT
country_region,
SUM(CASE
WHEN confirmed = '' THEN 0
ELSE CAST(confirmed AS BIGINT)
END) as total_cases,
SUM(CASE
WHEN deaths = '' THEN 0
ELSE CAST(deaths AS BIGINT)
END) as total_deaths
FROM covid_data
GROUP BY country_region
ORDER BY total_cases DESC
LIMIT 10;
- This query summarizes COVID-19 data by country, showing the top 10 countries with the highest number of confirmed cases.
- It calculates total cases and deaths for each country, handling empty values by treating them as zeros.
- The query groups the data by country, sums up the case and death counts, and then sorts the results to show the most affected countries first.

Clean Up

DROP TABLE IF EXISTS covid_data;

Delete File from S3 Bucket

permanently delete

Delete S3 Bucket

🙏
Thanks
for
Watching
AWS Data Exchange - Hands-On Demo
By Deepak Dubey
AWS Data Exchange - Hands-On Demo
AWS Data Exchange - Hands-On Demo
- 267