Partitioning CloudTrail Logs in Athena

Alex Smolen
4 min readJan 15, 2018

CloudTrail logs provide information about AWS API calls and are useful in a variety of scenarios:

While the information they contain is undoubtedly useful, interacting with CloudTrail logs can be difficult.

CloudTrail logs are delivered to S3 as JSON by default, so you could download the files and parse them locally with jq for exploration, or write a script for more complex tasks. While handy in a pinch, it takes time and bandwidth to download large log files. There’s no set way to distribute the analysis results, and it’s painful to write out the commands to get after what you’re looking for in a particular use case.

Alternatively, if you have the CloudTrail logs forward to CloudWatch logs, you could search via the CloudWatch Logs interface. I find the CloudWatch logs query syntax to be limited, and the results, again, aren’t easy to forward on for other processing.

You could put the CloudTrail logs in CloudSearch, but this requires creating a new AWS not-serverless resource, with the associated management overhead and costs. You could forward CloudTrail logs to other search services, like Splunk, but what if you don’t have that infrastructure at your fingertips?

Instead, you could use Athena. Athena lets you query data in S3 easily, without managing any server-like resources, using Presto under the covers.

One drawback of Athena is that you’re charged by the amount of data searched. By partitioning data, you can easily limit the scope of a query and reduce the cost of querying CloudTrail logs over time.

I used the following approach to generate Athena partitions for a CloudTrail logs S3 bucket. It assumes you have already set up CloudTrail logs in your account.

AWS Athena

In Athena, you need to create tables to query based on S3 locations. You can create a table in Athena pointing to S3 CloudTrail logs with the following query: cloudtrail_create_athena_table.gist. This is based off AWS Documentation, but note that this table includes partitions:

PARTITIONED BY (region string, year string, month string, day string)