This post is related to How to block 404 attacks using fail2ban.
If you have a fail2ban rule where you notice that Google bots are bing jailed, then there is a way to keep legitimate Google bots from being jailed.
Google Tidbits
An obvious thought might be to whitelist Google’s IPs on fail2ban so that Google bots can safely crawl your site. The problem here is that Google doesn’t share it’s IP ranges and they have stated they can change at any time. However, they do recommend verifying by doing a reverse DNS lookup, as you can see in this article https://developers.google.com/search/blog/2006/09/how-to-verify-googlebot.
Knowing this, we can leverage fail2ban’s ignorecommand
found in the jail.local
file.
This operation lets you point to a script to run some checks to determine if the provided IP should be jailed or ignored.
For the below steps, I’ll SSH into the server and make the updates using command line.
Step 1: Add script
Navigate to local bin, where we’ll add the script:
cd /usr/local/bin
Here, we’ll create a new script file and make it executable.
touch ignore_ip_check.sh && chmod +x ./ignore_ip_check.sh
Edit the file and add the following contents:
#!/bin/bash
IP="$1"
HOSTRESULT="$(host -W 1 ${IP})"
REGEX='.*(googlebot\.com\.|google\.com\.)'
if [[ "$HOSTRESULT" =~ $REGEX ]]; then exit 0; else exit 1; fi
To edit, I typically use vim. To use vim, just run vim ignore_ip_check.sh
.
Once vim launches, tap i
on your keyboard, paste in the above contents, then tap esc
on your keyboard, followed by :wq!
+ tap enter. This will save the new file.
And that’s how you’d use vim 80% of the time
Update jail.local
Now, in /etc/fail2ban
, edit the jail.local
file.
There is a section for ignorecommand =
This will need to be updated as follows:
ignorecommand = /usr/local/bin/ignore_ip_check.sh <ip>
Restart fail2ban and test
Lastly, in Cleavr, in server > services, select option to restart fail2ban.
To test, it may be easiest in Cleavr 2.0 (app.cleavr.io), go to the server > logs
section and view the fail2ban logs. Also, I’d recommend doing this on a different device or via a VPN where you can change IPs - since you’ll be triggering the IP you’re using to be banned.
Now, open up a browser, go to a site on the server, and generate 404 error enough times to get jailed. Check out the logs and make sure there are no fail2ban errors. If there is an error with the script, you’ll see it presented in the logs.
To give credit where credit is due, I based this approach on this article https://deeb.me/20180320/how-not-to-ban-googlebot, but I made a few updates to iron out the kinks.