Amazon Athena Tutorial- The New Era Of Data Analysis

Data analysis is a very complex process and there has always been attempts to ease it. There are many tools for analytics, and even the popular tech giant Amazon provides an AWS service named Amazon Athena. This Amazon Athena tutorial will guide you through the basics and advance usage of Amazon Athena.

Amazon Athena is an interactive data analysis tool used to process complex queries in relatively less time. It is server-less hence, there is no hassle of setting up and doesn't require managing the infrastructure. It is not a Database service hence, you just pay for the queries you run. You just point your data in S3, define the schema required and with a standard SQL you are good to go.

The topics covered in this article are as follows:

Introduction To Amazon Athena

On November 20, 2016, Amazon launched Athena as one of its services. As described earlier, Amazon Athena is a serverless query service that makes analysis of data, using standard SQL, stored in Amazon S3 simpler. With few clicks in the AWS Management Console, customers can point Amazon Athena at their data stored in Amazon S3 and run queries using standard SQL to get results in seconds.

With Amazon Athena, there is no infrastructure to set up or manage, and the customer pays only for the queries they run. Amazon Athena scales automatically, executing queries in parallel, which gives fast results, even with a large dataset and complex queries. Now, that you what is Amazon Athena let me take you through the difference it has compared to SQL Server.

Difference Between Microsoft SQL Server And Amazon Athena

If you are a Data Analyst and have an experience of analyzing data stored on S3, you will relate to this,

Data Analysts/Developers: Do you offer Storage?AWS: Yes.

Data Analysts/Developers: Do you have tools for Analytics?
AWS: Not sure."

Amazon worked on this and came up with Amazon Athena. Now, you have a tool to play with your data. Athena helps you analyze unstructured, semi-structured and structured data that is stored in Amazon S3. Using Athena you can create dynamic queries for your dataset. Athena also works with AWS Glue to give you a better way to store the metadata in S3.

Using AWS CloudFormation and Athena, you can use named queries. Named query allows you to name your query and then call it using the name.

This interactive service from AWS can be used by Data Scientists, developers to take a sneak peak into the table instead of running the complete query. It is also used to fetch data from S3, load it to different data stores using Athena JDBC driver, for log storage/analysis and Data Warehousing events.

Now that you know Athena is an interesting tool, let's find out in this Amazon Athena tutorial how to get your hands on this amazing service from Amazon.

Accessing Athena is very easy and it can be done by either:

These are few of the ways to access Amazon Athena. By now, you pretty much know everything important about Amazon Athena. Let's me walk you through the different features of Athena.

Out of the many services provided by Amazon, Athena is one of the services. It has many features that makes it suitable for Data Analysis. Let's take a look at the different features one by one.

  1. Easy Implementation: Athena doesn't require installation. It can be accessed directly from the AWS Console also directly by AWS CLI.
  2. Serverless: It is serverless, so the end-user doesn't need to worry about infrastructure, configuration, scaling or failure. Athena takes care of everything on its own.
  3. Pay per query: Athena charges you only for the query you run, i.e. the amount of data that is managed per query. You can save a lot if you can compress them and format your dataset accordingly.
  4. Fast: Athena is a very fast analytics tool. It can perform complex queries in less time by breaking the complex queries into simpler ones and run them parallelly, then combine the results to give the desired output.
  5. Secure: With the help of IAM policies and AWS Identity, Athena gives you complete control over the data set. As the data is stored in S3 buckets, IAM policies can help you manage control to users.
  6. Highly available: With the assurance of AWS, Athena is highly available and the user can execute queries round the clock. As AWS is 99.999% available, so is Athena.
  7. Integration: The best feature of Athena is that it can be integrated with AWS Glue. AWS Glue will help the user to create a better-unified data repository. This helps you create better versioning of data, better tables, views, etc.

Great isn't it? Athena provides many features at the same time, it is cost-efficient.

By now you must be impressed by AWS Athena. Now that you know quite a lot about Athena. Let's roll our sleeves and understand the working of Athena by performing a small demo. In this Amazon Athena tutorial, we will work on two Demos, let's find out what are they.

Demo - I (Creating Tables In Athena)

As you know all about Amazon Athena, let's take a dive on how to query your data stored as .json file in Amazon S3 using Athena.

  1. Create multiple JSON files containing entries
  2. Store the files to S3 bucket
  3. Create an external table for the files stored in S3
  4. Write a Query for accessing the data

Let's understand how to do the above-said tasks one by one.

  1. Create JSON Files. (Create the data without using newline character)
    JSON File- Amazon Athena Tutorial

  2. We will access S3 bucket using AWS CLI
    1. Configure IAM User
    2. Create S3 Bucket
    3. Copy files to S3 Bucket
  3. Create External Table in Athena. There are two ways of doing this:
    1. Using AWS Glue Crawler
    2. Manually
  4. We will create it manually:
    1. Create table.
      Amazon Athena Console
    2. Create a new database if you don't have one. Give a table name. Give the location of your file.
      Amazon Athena Console
    3. Select the type of file you will be working with. S elect the architecture of the data in your file.
      Amazon Athena Console
    4. As the entered data is not that complex, we don't need a partition. Click on "Create Table".
      Amazon Athena Console - Amazon Athena Tutorial
    5. Athena will auto-generate the Query for creating External Table and run it.
      Amazon Athena Console
      You have your external table ready.
  5. We write a query to select all data from the table.
    1. select * from testdb;
    2. Click on Run Query and you have all the information in your table. Amazon Athena Tutorial

Demo - II (Comparison Between Amazon Athena And MySQL)

In this Amazon Athena tutorial, now we will compare MySQL and Athena and understand as to how even simple queries take less time to execute in Athena.

  1. Loading CSV file to MySQL took around 1 hour but in Athena, it took just 3 mins to upload the CSV file to S3 and 0.42 seconds to create a table for the same.
  2. Select query. select * from table.Select query in Athena. Select query in MySQL.
  3. Selecting a specific column from the table.Select a specific column in Athena
    Select a specific column in MySQL.
  4. Getting the count of a specific column.Count of a specific column in Athena. Count of a specific column in MySQL.
  5. Counting number of records in the table.Count all records in Athena.C
    ount all records in MySQL.
  6. Select query with a specified range.Select query within the said range in Athena.S
    elect query within the said range in MySQL.

That was a brief comparison on basic SQL commands between MySql and Amazon Athena.

I hope this blog was informative and helped in gaining an idea about AWS Athena. Choose a career in AWS and get certified inwhich will boost your professional career. Take a look at the different use case which are revolutionizing business. All The Best!

If you wish to learn more about AWS and the amazing services like Athena, then check out our AWS Architect Course which comes with instructor-led live training and real-life project experience. This training will help you understand the different services offered by AWS in depth and help you achieve mastery over the subject.

Got a question for us? Please mention it in the comments section of "Amazon Athena" and we will get back to you.

Priyaj Kumar is an AWS aspirant working as a Research Analyst at Edureka. He is working on different AWS services and has good knowledge in Python and Java. He is from Dumka, Jharkhand. He loves playing Badminton. You can reach him at www.linkedin.com/in/priyajkumar/


Comments

Close