AWS Data Exchange
Hands-On
Demo
In this demo, we will:
- Explore the AWS Data Exchange catalog
- See how to Subscribe to a dataset
- Check out the Covid Data
- Set up S3 bucket for data storage
- Download and save data from AWS Data Exchange to S3
- Access and analyze the data
- Clean up resources
Agenda
AWS Data Exchange
COVID-19 Data Lake
Resources on AWS
Usage examples
S3 Bucket
Download Under csv/ folder
Enigma-JHU.csv.gz
head -n 1000 Enigma-JHU.csv > Enigma-JHU-1000.csv
fips,admin2,province_state,country_region,last_update,latitude,longitude,confirmed,deaths,recovered,active,combined_key
,,Anhui,China,2020-01-22T17:00:00,31.826,117.226,1,,,,"Anhui, China"
,,Beijing,China,2020-01-22T17:00:00,40.182,116.414,14,,,,"Beijing, China"
,,Chongqing,China,2020-01-22T17:00:00,30.057,107.874,6,,,,"Chongqing, China"
,,Fujian,China,2020-01-22T17:00:00,26.079,117.987,1,,,,"Fujian, China"
,,Gansu,China,2020-01-22T17:00:00,36.061,103.834,,,,,"Gansu, China"
code Enigma-JHU-1000.csv
Create S3 bucket
covid19-data-lake-demo-917212
Upload CSV File
CREATE EXTERNAL TABLE covid_data (
fips STRING,
admin2 STRING,
province_state STRING,
country_region STRING,
last_update STRING,
latitude STRING,
longitude STRING,
confirmed STRING,
deaths STRING,
recovered STRING,
active STRING,
combined_key STRING
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
"separatorChar" = ",",
"quoteChar" = "\"",
"escapeChar" = "\\"
)
STORED AS TEXTFILE
LOCATION 's3://covid19-data-lake-demo-917212/'
TBLPROPERTIES ('skip.header.line.count'='1');
Create Table in Athena
SELECT * FROM covid_data LIMIT 10;
Verify Table Data
Results
SELECT
country_region,
SUM(CASE
WHEN confirmed = '' THEN 0
ELSE CAST(confirmed AS BIGINT)
END) as total_cases,
SUM(CASE
WHEN deaths = '' THEN 0
ELSE CAST(deaths AS BIGINT)
END) as total_deaths
FROM covid_data
GROUP BY country_region
ORDER BY total_cases DESC
LIMIT 10;
- This query summarizes COVID-19 data by country, showing the top 10 countries with the highest number of confirmed cases.
- It calculates total cases and deaths for each country, handling empty values by treating them as zeros.
- The query groups the data by country, sums up the case and death counts, and then sorts the results to show the most affected countries first.
Clean Up
DROP TABLE IF EXISTS covid_data;
Delete File from S3 Bucket
permanently delete
Delete S3 Bucket
🙏
Thanks
for
Watching
AWS Data Exchange - Hands-On Demo
By Deepak Dubey
AWS Data Exchange - Hands-On Demo
AWS Data Exchange - Hands-On Demo
- 73