Amazon Athena - Hands On Demo

Create S3 Bucket

 

  1. In the AWS Management Console, locate and click on the Services menu at the top.
  2. Select S3: Find and select S3 under the Storage section or use the search bar to find it.
  3. Open S3 Dashboard: You will be directed to the S3 dashboard.
  4. Create Bucket: Click on the Create bucket button.
  5. Bucket Name: Enter a unique name for your bucket. Remember, the name must be globally unique across all AWS users.
  6. Region Selection: Choose the AWS Region where you want the bucket to reside.
  7. Copy Settings (Optional): If you're replicating settings from an existing bucket, you can choose to copy settings.
  8. Block Public Access: AWS sets block all public access to new buckets by default. You can change these settings based on your requirements.
  9. Bucket Policy (Optional): You can add a bucket policy for fine-grained access control.
  10. Versioning: You can enable versioning to keep multiple versions of an object in one bucket.
  11. Server Access Logging: Enables logging of access requests to the bucket.
  12. Tags: Add tags for easier identification and management.
  13. Object Lock: Enable this to prevent objects from being deleted or overwritten for a fixed amount of time or indefinitely.
  14. Review Settings: Go through the settings to ensure they are as per your requirements.
  15. Create Bucket: Click on the Create bucket button at the bottom of the page.

 

Create an Athena Database

 

  1. Open the Athena console.

  2. If this is your first time to visit the Athena console in your current AWS Region, choose Explore the query editor to open the query editor. Otherwise, Athena opens in the query editor.

  3. Choose Edit Settings to set up a query result location in Amazon S3.

  4. For Manage settings, do one of the following:

  5. In the Location of query result box, enter the path to the bucket that you created in Amazon S3 for your query results. Prefix the path with s3://.

  6. Choose Browse S3, choose the Amazon S3 bucket that you created for your current Region, and then choose Choose.

  7. Choose Save.

  8. Choose Editor to switch to the query editor.

  9. On the right of the navigation pane, you can use the Athena query editor to enter and run queries and statements.

 

CREATE DATABASE mydatabase

Create an Athena Table

 

  1. In the navigation pane, for Database, make sure that mydatabase is selected.

  2. To give yourself more room in the query editor, you can choose the arrow icon to collapse the navigation pane.

  3. To create a tab for a new query, choose the plus (+) sign in the query editor. You can have up to ten query tabs open at once.

  4. To close one or more query tabs, choose the arrow next to the plus sign. To close all tabs at once, choose the arrow, and then choose Close all tabs.

  5. In the query pane, enter the following CREATE EXTERNAL TABLE statement. The regex breaks out the operating system, browser, and browser version information from the ClientInfo field in the log data.

  6. In the LOCATION statement, replace myregion with the AWS Region that you are currently using (for example, us-east-1).

  7. Choose Run.

    The table cloudfront_logs is created and appears under the list of Tables for the mydatabase database.

CREATE EXTERNAL TABLE IF NOT EXISTS cloudfront_logs (
  `Date` DATE,
  Time STRING,
  Location STRING,
  Bytes INT,
  RequestIP STRING,
  Method STRING,
  Host STRING,
  Uri STRING,
  Status INT,
  Referrer STRING,
  os STRING,
  Browser STRING,
  BrowserVersion STRING
  ) 
  ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
  WITH SERDEPROPERTIES (
  "input.regex" = "^(?!#)([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+[^\(]+[\(]([^\;]+).*\%20([^\/]+)[\/](.*)$"
  ) LOCATION 's3://athena-examples-myregion/cloudfront/plaintext/';

Query Data

SELECT os, COUNT(*) count 
FROM cloudfront_logs 
WHERE date BETWEEN date '2014-07-05' AND date '2014-08-05' 
GROUP BY os

Download Results and Download Recent Queries

 

  1. To save the results of the query to a .csv file, choose Download results.

  2. To view or run previous queries, choose the Recent queries tab.

  3. To download the results of a previous query from the Recent queries tab, select the query, and then choose Download results. Queries are retained for 45 days.

  4. To download one or more recent SQL query strings to a CSV file, choose Download CSV.

Thanks

For

Watching

Amazon Athena - Hands On Demo

By Deepak Dubey

Amazon Athena - Hands On Demo

Amazon Athena - Hands On Demo

  • 50