This article deals with problem how to detect that request on your website is made by google bot. Based on that you can for example display different content to google crawler than to human. This practice is generally called cloaking and it is usually penalized by search engines. But there are exceptions when cloaking is allowed and convenient. One of the good examples is allowing google bot to index content which is behind paywall.
My solution is for django framework but can be transformed to any web application written in python. I use several PIP packages:
You can install these packages using this command:
Google recommends to verify its crawlers by performing DNS lookups. My solution follows this recommendation. However I don't want to perform DNS lookup for each request coming to my website and that's why I check agent type first. Here comes the solution:
You can decide to solve it without the external libraries that I've used but check the code of these libraries first. The problems they deal with are not that much straightforward as you might think in the beginning.
Be aware that function is_google_bot is going to return False for requests coming from tools like Google testing tool for structured data for two reasons:
- user-agents library is not detecting them as bots
- detected IP address belongs to user of such tool, not the tool itself
Therefore testing this function might be tricky. I put code for unit test skeleton that I've used for testing but I don't recommend you to keep such test in codebase because it hardcodes google IP address and performs DNS queries.