Vulnerability Inbox Zero
This is a summary of my LocoMocoSec 2022 and QCon SF 2022 conference talks — thanks to co-author Jake Mertz and the LaunchDarkly Security team!
The LaunchDarkly Security team has a mission to help our customers chill out. We show our work when we solve security problems so they can trust our service. Seems easy enough, but we had a security problem that was decidedly un-chill.
If you’ve run vulnerability scanners, you know the story. There are a lot of vulnerability reports that you have to triage. As if that wasn’t bad enough, you have so many scanners — network, OS, web app, DB, code, containers, cloud configurations, and so on. Overwhelmed by the flood and unable to separate signal from noise, security teams tune out.
Our team also has to meet the strict vulnerability management requirements for our FedRAMP environment. NIST 800–53 RA-5 describes implementing vulnerability scanning at each architectural layer. Each scan result must be mitigated or marked as accepted within a defined set of timelines.
How do we deal with the huge volume of scanner results while keeping our chill level to the max? We found inspiration with Inbox Zero.
Inbox Zero is about processing every email you receive until the inbox is empty, with zero messages. Why would you do this?
- You can be less reactive and focus on priorities
- Things don’t slip through the cracks
The most important part of Inbox Zero is automatically deciding whether a given message can be deleted. Delete, delete, delete. Think in shovels not in teaspoons.
So what does Inbox Zero mean for vulnerability management? Merlin Mann’s original Inbox Zero blog posts quote Bruce Schneier — “security is a process, not a product”. He later states that:
Like digital security and sustainable human love — smart email filtering is a process
No guarantees of sustainable human love, but if you want be as chill as our team, you might want to consider how you can Inbox Zero your vulnerability scans.
It started in 2020. We needed to scan all of our resources and respond to the results within a defined timeframe. But trying to respond to every scan result is the recipe for a bad time. There’s no central inbox and there’s no way to determine if it’s zero. Just a bunch of Slack messages, emails, CSV files, Excel spreadsheets, blood, sweat and tears.
How do you build a single source of truth, filled with only items that merited our time and attention? We needed to get rid of the inevitable:
- Out-of-date code dependencies where vulnerable functions weren’t being called
- Unused resources (e.g. previous container images) that needed to be cleaned up
- CVEs that we were alerted about but hadn’t fixed yet
These are equivalent to the spam and the “fwd:” emails from your in-laws in your inbox. We wanted to spend our time, attention, and energy with the few findings that actually mattered. The smarter our processing, the happier our responders.
Our team got together and brainstormed what an Inbox Zero vulnerability processing pipeline should look like. First, we knew that we wanted to automatically suppress (delete, delete, delete) known noisy items. Next, we needed someone to come in and triage the result to see if it’s a false positive. If so, don’t ignore it! Write a suppression so that the next time we won’t see it. Is it critical? If it’s not, file a ticket and we’ll work on it with our regular work stream of vulnerability scan results. For critical issues (think log4j), ring the alarm and declare an incident.
We clarified our principles to turn these ideas into software:
- We wanted to build our own system, using AWS services. Existing products either didn’t fit our requirements or would require significant work to integrate into our infrastructure.
- We wanted filters to be code-based so our rules could be expressive, and we could better track, blame, and change control our suppression logic
- We wanted to support our FedRAMP requirements, and we didn’t spend want to spend a lot of time in Excel
The vulnerability processing pipeline we built can be viewed as three separate sections: scan, process, and respond.
On the left is the scanning section with some of our many scanning tools raining down results to ruin our day and our moods.
In the middle is the processing section where we standardize and suppress before sending to our inbox. At the center of our processing is AWS Security Hub.
And then the final section on the right is for responding. This is where our team members get an alert to triage with all the information they need to make an informed decision and take an appropriate response.
Scanning architecture
We input many scanners results into the same pipeline. Inspector is an AWS service that performs scanning of our EC2 instances for CVEs and CIS benchmark checks. Trivy is an open source scanning tool that’s used for scanning some of our container images, Tenable tests our web applications, our APIs and our databases. Dependabot looks at out-of-date dependencies in our code.
We built Forwarders — Lambda functions that take the findings out of these external vulnerability scans and imports them into AWS Security Hub. They convert findings into ASFF, decorating them with some contextual information.
We chose Security Hub to help us forward and centralize Inspector results from different regions and different accounts. This only required us to deploy the Inspector agents on our EC2 hosts and let AWS handle the routing of those findings into Security Hub.
Processing architecture
The second section of our pipeline is where we process the findings and prepare them for our response. The most important part is Suppressor, which takes a list of new scanning results and cuts the noise out. Whenever there’s a new finding reported to Security Hub, it runs a function to make sure that all of the findings go through a set of rules that recategorize and suppress known false positives.
When Suppressor finishes running, it reports the results to S3, where they’re picked up by our SIEM for alerting.
Suppressor rules are written in Python, and live in a GitHub repository. We have a CI pipeline where we lint, test, and deploy our rules.
The rule above suppresses a CVE which doesn’t have a patch available from the operating system maintainer, and which we determined did not impact our systems.
There are some other supporting Lambda functions that sit in this section. Requeue runs anytime we update our rule sets in GitHub via CI. It requeues all of our findings to Suppressor for re-evaluation. This ensures the findings in Security Hub are accurate. Terminator removes findings when resources are terminated or deleted.
Reporting architecture
A summary of unsuppressed findings are forwarded from Suppressor to our SIEM. It handles annotating the findings with additional info and assists us with de-duplication of vulnerabilities across groups of hosts or similar resources. This ensures that whether a finding was discovered on one host or 200, we only get one alert.
Inbox Zero is not just about deleting messages. For responding, we have the rotating Security Commander role on our team, augmented by a Slack Triage Bot.
The Security Commander is responsible for being the human Inbox Zero responder. Their job is to quickly triage — but not fix — any new findings that come in. If an alert a false positive, ship a suppression PR, and make sure that that finding is removed from Security Hub. If the finding is legit, then determine it’s criticality. If it’s not, file a ticket. If it is, create an incident.
Security Commanders do spend time writing suppressions or filing tickets, but can now mostly do regular sprint work. They also enable the rest of the team to stay focused.
Our Triage Bot scans the summary messages as they come into Slack from our SIEM. It keeps us honest by tracking metrics about the kinds of alerts we’re seeing and how quickly we’re responding to them.
Asset inventory
Vulnerability Inbox Zero feels great. But how do we know that we’re actually scanning all of your important resources?
We have Asset Inventory Lambda functions that track our cloud infrastructure, code repositories, and scanning configurations. For EC2 instances — do they have Inspector running? For GitHub repositories — is Dependabot and static code analysis enabled? We additionally have an Endpoint Checker Lambda, and we use this to scan all of our domains to determine whether or not they’re publicly accessible.
FedRAMP vulnerability management
FedRAMP Continuous Monitoring (ConMon) is a set of recurring meetings with specific documents that need to contain accurate data about vulnerability scan results.
We built a few additional Lambdas to automatically generate our monthly continuous monitoring reports. The Asset Inventory Lambdas generate a list of cloud resources and their compliance with many of our security controls, things like disk encryption, running security agents, and so on.
We then query Security Hub to ensure that all open vulnerabilities map to Jira tickets, and are associated with what are known as POAMs (Plan of Action and Milestones).
We can then automatically generate the Excel spreadsheets that we need to provide to the Federal government that show we’re ready to handle Federal data.
Lessons learned
Security Hub has some rough spots:
- The ASFF format for Security Hub is pretty rigid, and we had to kind of squeeze some of the external findings into the schema
- Everything needs a unique finding ID, even similar findings differing only by environment or account
- The rate limits are pretty low — we had to re-architect around them
Inspector is also a source of problems:
- v1 doesn’t work in all regions that we wanted to
- v1 requires its a separate agent (v2 uses SSM agent)
- v2 doesn’t support CIS benchmarks
Another thing we learned is that our Ubuntu EC2 instances require restarts even with unattended updates. We recommend tracking reverse uptime and making sure that old hosts get rebooted frequently.
Next steps
Looking forward, we’re excited to make some improvements to this pipeline.
- Upgrade to Inspector v2 once it supports CIS benchmarks
- Add more vulnerability scanners
- Expire suppressions on a regular cadence
- Automatically delegate findings to teams who can best fix them
- Create team scorecards where we could incentivize teams to update their own infrastructure
- Put vulnerability data in a data warehouse for analysis and visualization
Love them or hate them, vulnerability scanners aren’t going anywhere. We recommend you tame the avalanche of findings with a noise-suppressing processing pipeline. Think in shovels, not in teaspoons. Join us in the practice of Inbox Zeroing your way to vulnerability scan tranquility.