We have a web application that has been running on AWS for several years. As application load balancers and the AWS WAF service was not available, we utilised and external classic ELB point to a pool of EC2 instances running mod_security as our WAF solution. Mod_security was using the OWASP Mod_security core rule set.
Now that Application Load Balancers and AWS WAFs are available, we would like to remove the CPU bottleneck which stems from using EC2 instances with mod security as the current WAF.
Step 1 – Base-lining performance with EC2 WAF solution.
The baseline was completed using https://app.loadimpact.com where we ran 1000 concurrent users, with immediate rampup. On our test with 2 x m5.large EC2 instances as the WAF, the WAFs became CPU pinned within 2mins 30 seconds.
This test was repeated with the EC2 WAFs removed from the chain and we averaged 61ms across the loadimpact test with 1000 users. So – now we need to implement the AWS WAF solution so that can be compared.
Step 2 – Create an ‘equivalent’ rule-set and start using AWS WAF service.
We used terraform for this environment so the CloudFormation web ACL and rules are not being used and I will start be testing out the terraform code upload by traveloka. After having a look at the code in more detail I decided I need to get a better understanding of the terraform modules (and the AWS service) so I will write some terraform code from scratch.
So – getting started with the AWS WAF documentation we read, ‘define your conditions, combine your conditions into rules, and combine the rules into a web ACL.
- Conditions: Request strings, source IPs, Country/Geo location of request IP, Length of specified parts of the requests, SQL code (SQL injection), header values (i.e.: User-Agent). Conditions can be multiple values and regex.
- Rules: Combinations of conditions along with an ACTION (allow/block/count). There are Regular rules whereby conditions can be and/or chained. Rate-based rules where by the addition of a rate-based condition can be added.
- Web ACLs: Whereby the action for rules are defined. Multiple rules can have the same action, thus be grouped in the same ACL. The WAF uses Web ACLs to assess requests against rules in the order which the rules are added to the ACL, whichever/if any rules is matched first defines which action is taken.
Starting simple: To get started I will implement a rate limiting rule which limits 5 requests per minute to our login page from a specified IP along with the basic OWASP rules from terraform code upload by traveloka . Below is our main.tf with the aws_waf_owasp_top_10_rules created for this test.
- main.tf which references our newly created aws_waf_owasp_top_10_rules module
- The aws_waf_owasp_top_10_rules module
Step 3 – Validate functions of AWS WAF
To confirm blocking based on the rate limiting rule I am using Apache’s Benchmarking tool, ab.
ab -v 3 -n 2000 -c 100 https://<my_target.com.au>/login > ab_2000_100_waf_test.log
This command logs request headers (-v 3 for verbosity of output), makes 2000 requests (-n 2000) and conducts those request 100 concurrently (-c 100). I can then see failed requests by tailing the output:
tail -f ./ab_2000_100_waf_test.log | grep -i response
All looks good for the rate limiting based blocking, though it appears that blocking does not occur are exactly 2000 requests in the 5 minute period. It also appears that there is a significant (5-10min) delay on metrics coming through to the WAF stats in the AWS console.
The blocks are HTTP 403 responses from the ELB:
WARNING: Response code not 2xx (403) LOG: header received: HTTP/1.1 403 Forbidden Server: awselb/2.0 Date: Mon, 01 Jul 2019 22:39:11 GMT Content-Type: text/html Content-Length: 134 Connection: close
After success on the rate limiting rule, the OWASP Top 10 mitigation rules need to be tested. I will use Owasp Zap to generate some malicious traffic and see when happen!
So it works – which is good, but I am not really confident about the effectiveness of the OWASP rules (as implemented on the AWS WAF). For now, they will do… but some tuning will probably be desirable as all of the requests OWASP ZAP made contained (clearly) malicious content but only 7% (53 / 755) of the requests were blocked by the WAF service. It will be interesting to see if there are false positives (valid requests that are blocked) when I conduct step 4, performance testing.
Step 4 – Conduct performance test using AWS WAF service, and
Conducting a load test with https://app.loadimpact.com demonstrated that the AWS WAF service is highly unlikely to become a bottleneck (though this may differ for other applications and implementations).
Step 5 – Migrate PROD to the AWS WAF service.
Our environment is fully ‘terraformed’, implementing the AWS WAF service as part of our terraform code was working within an hour or so (which is good time for me!).
Next Steps
Security Automatons: https://aws.amazon.com/solutions/aws-waf-security-automations/, is this easy to do with Terraform? https://github.com/awslabs/aws-waf-security-automations has:
- waf-reactive-blacklist
- waf-bad-bot-blocking
- waf-block-bad-behaving
- waf-reputation-lists