During query execution, Athena will use this information to project the partition values instead of retrieving them from the AWS Glue Data Catalog or external Hive metastore. I then utilize AWS Glue Crawler to create partition for facilitating AWS Athena query. Athena is one of best services in AWS to build a Data Lake solutions and do analytics on flat files which are stored in the S3. Anything you can do to reduce the amount of data that’s being scanned will help reduce your Amazon Athena query costs. It makes Athena queries faster because there is no need to query the metadata catalog. You can get faster results at a lower cost by restricting the volume of data scanned by a query using filters based on the partition. I tried to use Partition projection with like this: Athena Hive partitioning . Because its always better to have one day additional partition, so we don’t need wait until the lambda will trigger for that particular date. Queries that constrain on the partitioning column(s) will run substantially faster because the system can reduce the volume of data scanned by the query when using filters based on the partition. AWS Athena supports Apache Hive partitioning. Partition Projection in AWS Athena is a recently added feature that speeds up queries by defining the available partitions as a part of table configuration instead of retrieving the metadata from the Glue Data Catalog. NOTE: I have created this script to add partition as current date +1(means tomorrow’s date). I'm using AWS Athena to query S3 bucket, that have partitioned data by day only, the partitions looks like day=yyyy/mm/dd. In our previous article, Getting Started with Amazon Athena, JSON Edition, we stored JSON data in Amazon S3, then used Athena to query that data. You can partition your data by a key for example, and you can partition based on time, which leads to a multi-level partitioning scheme. You can get significant cost savings and performance gains by compressing, partitioning, or converting your data to a columnar format, because each of those operations reduces the amount of data that Athena needs to scan to execute a query. Partitions are like virtual columns that help the system to scan less data per query. Athena Hive partitioning . Main Function for create the Athena Partition on daily. In the backend its actually using presto clusters. Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using ... and alter tables and partitions. To add a partition in the catalog, choose New Query and execute the following statement: MSCK REPAIR TABLE partitiondatetable Now data has been loaded to Athena catalog. General Use Cases Queries that take a significant amount of time to run against highly partitioned tables. Here Im gonna explain automatically create AWS Athena partitions for cloudtrail between two dates. Don't worry too much about the 128 MB file size rule of thumb. It wouldn't be very different from partitions in a table, but could be faster depending on how Athena determines which partitions to query. With Amazon Athena, you only pay for the queries that you run. When I tried to us Glue to run update the partitions every day, It creates new table for each day (sync 2017, around 1500 tables). Now, you can query the Amazon S3 data directly to get the results: I have a pipeline that load daily records into S3. Partition created by the above query needs to be added in the catalog so that we can query them later. You are charged based on the amount of data scanned by each query. In this article, we will partition the data, and compare the results.