Partitioning CloudTrail Logs in Athena
CloudTrail logs provide information about AWS API calls and are useful in a variety of scenarios:
- Determining least privilege IAM policies
- Investigating security incidents
- Summarization of access for compliance
- Plain ole’ debugging
While the information they contain is undoubtedly useful, interacting with CloudTrail logs can be difficult.
CloudTrail logs are delivered to S3 as JSON by default, so you could download the files and parse them locally with jq for exploration, or write a script for more complex tasks. While handy in a pinch, it takes time and bandwidth to download large log files. There’s no set way to distribute the analysis results, and it’s painful to write out the commands to get after what you’re looking for in a particular use case.
Alternatively, if you have the CloudTrail logs forward to CloudWatch logs, you could search via the CloudWatch Logs interface. I find the CloudWatch logs query syntax to be limited, and the results, again, aren’t easy to forward on for other processing.
You could put the CloudTrail logs in CloudSearch, but this requires creating a new AWS not-serverless resource, with the associated management overhead and costs. You could forward CloudTrail logs to other search services, like Splunk, but what if you don’t have that infrastructure at your fingertips?