Use AWS Glue to make CloudTrail Parquet partitions

Alex Smolen
4 min readJul 7, 2020

You can turn on CloudTrail logging with a single command, but how do you use the data for audits and automation? In this post, I’ll describe cloudtrail-parquet-glue, which makes CloudTrail logs efficiently Athena-searchable with minimal custom code (because ”the best code is no code”) using AWS Glue.

First, why use Athena for CloudTrail logs?

  • Athena uses familiar expressive SQL and allows you to search large volumes of data efficiently. I find other CloudTrail data stores awkward to use (e.g. the CloudTrail log insights query language).
  • CloudTrail Athena queries can be automated and chained with other serverless technologies like Lambda to create security automation tailored to your AWS usage. You can build alerts that have reduced false positives and negatives, are enriched with additional data, and are presented to end-users in a way that makes sense to your organization.
  • Athena tables are integrated with other AWS data tools like AWS QuickSight.
  • Athena can cross-reference CloudTrail logs with other forms of logs (e.g. IPs in access logs), making complicated data analysis scenarios easier.
  • Having CloudTrail logs in the AWS data platform lets you tap into the benefits of a data lake architecture, as opposed to having your logs isolated in a third-party service.

If you want to use Athena to query CloudTrail, there are several alternatives. The post has two parts — a survey of existing “CloudTrail logs queryable in Athena” projects, and a description of my novel cloudtrail-parquet-glue project that builds a Glue workflow to connect Athena to CloudTrail.

Existing approaches for querying CloudTrail logs with Athena

Several projects have tried to make Athena and CloudTrail work better together. In 2018, I demonstrated how Athena could query CloudTrail logs in S3 with Lambda-created partitions. In August 2019, GorillaStack published Query your CloudTrail like a pro with Athena, where they described the athena-cloudtrail-partitioner project. In October 2019, Duo published Introducing CloudTrail-Partitioner which describes the cloudtrail-partitioner project. The projects have a few differences — AWS CDK rather than CloudFormation, different Lambda error handling— but are otherwise similar.