The battle against bots: How to detect and stop them
Bots are everywhere. How can you distinguish the bad from the good, and how can you stop them? Read our guide for practical steps on how to stop bots and protect your app.
In today’s digital environment, bots are everywhere, performing tasks that can be either beneficial or malicious. While some bots streamline processes by automating repetitive jobs—like searching, indexing, or customer service—others can disrupt services, steal data, and inflate metrics.
In this guide, we’ll walk through technical strategies and practical implementations to help you stop bots and protect your systems from unwanted activity.
What are bots?
A bot (short for robot) is an automated script designed to perform specific tasks or processes over the internet. Bots can be designed to operate without human intervention and are often programmed to carry out repetitive actions more efficiently than humans. They vary widely in their functionality and use. Some examples are:
- Chatbots: These are bots designed to simulate conversation with human users. They are often used in customer service, tech support, and personal assistants like Siri or Alexa.
- Web Crawlers: These bots scan the web and index websites, which is what search engines like Google use to gather information and provide search results.
- Social Media Bots: Bots that automate tasks on platforms like Twitter, Instagram, or Facebook, such as liking, posting, following, or even creating content.
- E-commerce Bots: These bots are used in online shopping to purchase limited-edition items or monitor prices automatically.
The problems start when bots are used for malicious reasons, like spamming, spreading misinformation, or stealing information.
How to detect bots
Before you implement measures to stop bots, it helps to understand common bot behavior. Many bots interact with applications over HTTP requests and can mimic human browsing. Still, some telltale signs make them easier to spot:
- High request rates.
- Unusual navigation or browsing patterns.
- Access attempts to restricted URLs or areas not typically viewed by legitimate users.
- Ignoring robots.txt directives that guide well-behaved bots.
Monitor traffic patterns
Bots often exhibit patterns that differ from human users (e.g., high-frequency requests from the same IP, predictable access patterns, or lack of session cookies). Tracking these patterns allows you to apply additional rate limits or CAPTCHAs when suspicious behavior is detected.
Look out for these patterns:
- Unusually fast or slow interactions: Bots can fill out forms, click buttons, and navigate websites far faster than a human can. For instance, if a user submits a form in less than a second, it's likely a bot. Conversely, if interactions take too long or are erratic, it could be indicative of a bot.
- High volume of requests in a short period: Bots can send multiple requests in a short time frame (e.g., brute-force login attempts), which would be impossible for a human to do at that speed.
- Absence of natural mouse movement: Human users typically move the mouse cursor in a somewhat smooth, erratic way, while bots may have linear or rigid movements. Bots often go straight to specific points without moving around the screen.
- Lack of mouse hover or scrolling: Bots rarely hover over elements or scroll through pages like humans. If a bot is interacting with a page, it might click on buttons or links directly without engaging with content in a natural way.
- Unusual IP locations: Bots might come from unexpected or suspicious geographic locations, like a large number of requests from an IP range that's associated with data centers or proxies (often used by bots to disguise their real location).
- Use of proxy servers: Bots often use proxies to mask their real IP addresses, leading to traffic from unusual or identical IP addresses in a short period.
- Inconsistent or missing user agent strings: Bots may send HTTP requests with non-standard or empty user-agent strings (or use a user-agent that mimics a browser). Many bots disguise their identities by pretending to be browsers or other legitimate software.
- Unusual browser or operating system combinations: Humans typically use a combination of popular browsers (e.g., Chrome, Firefox) and operating systems (e.g., Windows, macOS). Bots might use combinations that don't align with common user patterns (e.g., a Windows machine with an old mobile browser).
- Lack of expected headers: HTTP requests made by bots might lack common headers that a browser typically sends, such as
Accept-Language
,Accept-Encoding
, andConnection
. - No referrer or suspicious referrer headers: Bots often don't send referrer information or may use false or empty referrer headers.
- Missing or abnormal cookies: Humans generally have cookies from past visits (like session cookies). Bots, however, may not handle cookies the same way or might not retain session information across requests.
- Non-Human interaction patterns: Bots usually don't exhibit behaviors like moving between pages, interacting with a variety of elements on a page, or waiting for content to load. Instead, they may simply perform specific actions quickly and repetitively, such as filling out forms or submitting queries.
- Unusual click sequences: A bot might click on links or buttons in a manner that doesn't follow the usual progression of a user journey. For example, clicking on an "Add to Cart" button without first viewing the product details.
Employ bot detection tools
Monitoring and analyzing traffic is an essential part of bot prevention and website security. There are several tools and services available that use strategies like machine learning and traffic pattern analysis to detect and block unwanted bots. Below is a list of popular tools that can assist in bot detection and traffic monitoring:
- Cloudflare BotD (Bot Detection): Uses machine learning to identify known and unknown bots in real-time, analyzes human vs. bot behavior patterns to identify automated traffic, and provides detailed metrics and insights, such as bot detection rates and impacted devices.
- Distil Networks (now part of Imperva): Uses machine learning and AI to identify and block malicious traffic. It helps protect websites from a variety of bots, including scrapers, account takeover bots, and credential stuffing.
- PerimeterX Bot Defender: A bot mitigation platform that leverages device fingerprinting, behavioral analysis, and a global threat intelligence database to detect and mitigate bot traffic.
Other tools include Sitelock, DataDome, ThreatX, BotGuard, and more.
Use honeypots
Honeypots are fake or decoy elements on your website or app that attract bots. Because legitimate users shouldn’t normally find these elements, any activity on them likely comes from automated scripts.
A honeypot is a hidden form field or link that is designed to deceive bots into interacting with it, while human users are unaware of it. Bots typically fill out all available fields in a form or click on any links they find, while real users would ignore the honeypot since they don’t see it.
To implement a honeypot:
- Deploy honeypots: You add a hidden field to your form or page that humans cannot see (usually via CSS), but bots that automatically crawl and fill out forms will detect and interact with it.
- Monitor interactions: Track hits on these decoy endpoints to identify and learn from bot behavior. Bots, which do not render the page like humans, will attempt to interact with the honeypot field, while humans, who won’t see it, will not.
- Flag malicious activity: If the honeypot field is filled in, the request is flagged as suspicious or bot-driven, and it can be discarded or blocked.
If you prefer not to implement honeypots manually, there are tools and libraries that can automate the process:
- Spam Protection Plugins: For CMS platforms like WordPress, plugins like Antispam Bee or WP Armour provide built-in honeypot mechanisms.
- Bot Detection Services: Services like Cloudflare, reCAPTCHA, or PerimeterX also offer advanced bot management features that incorporate honeypot-like tactics.
How to stop bots
Once you detect them, you have to stop them. There are some proven strategies to stop unwanted bots in your environment. Each approach can be tailored to your specific infrastructure and risk profile.
Implement rate limiting
Rate limiting is an effective method of preventing bots from overwhelming your website or application. It controls the number of requests a user (or bot) can make within a specific time frame, helping mitigate excessive bot traffic.
By combining rate limiting with other measures, such as CAPTCHAs, bot detection algorithms, and behavior analysis, you can significantly reduce bots' impact on your website or application.
Here’s how you can use rate limiting to stop bots:
- Choose a rate-limiting strategy: There are several strategies for implementing rate limiting:
- Fixed Window: Allows a set number of requests in a specific time window (e.g., 100 requests per minute). After the limit is reached, further requests are denied until the window resets.
- Sliding Window: A more flexible approach that considers a moving time window (e.g., if a user has made 50 requests in the past 10 minutes, they can only make 50 more in the next 10 minutes).
- Token Bucket: Requests are granted if there’s a token available. Tokens are refilled at a certain rate (e.g., 1 token per second). If no tokens are available, the request is denied.
- Leaky Bucket: Requests accumulate in a "bucket." The bucket empties at a constant rate, and requests are allowed as long as the bucket isn’t full.
- Configure rate limits: Set thresholds based on IP address, users, or endpoints to accommodate legitimate traffic.
- Requests per IP Address: Limit the number of requests a single IP address can make in a given time period (e.g., 100 requests per minute).
- Requests per Account/User: For authenticated users, limit requests based on their account or user ID (useful for logged-in sessions).
- Requests per Endpoint: Some endpoints may be more vulnerable to bot attacks, so you can set stricter limits for high-risk or sensitive operations (e.g., login, password reset, or payment processing).
- Track and store requests: Use in-memory stores like Redis or Memcached to track the number of requests from each user, IP address, or session. These systems can efficiently store and expire counts for rate-limiting purposes. Implementing rate limiting with Redis helps with fast access to request counts and automatic expiration after the time window ends.
- Block or throttle requests: Once the rate limit is exceeded, you can either block further requests or throttle them (slow down the responses).
- Blocking: Return a status code such as
429 Too Many Requests
to inform the user or bot that they’ve exceeded the rate limit. - Throttling: Introduce delays between responses to slow down bots and make it harder for them to continue operating.
- Blocking: Return a status code such as
- CAPTCHAs and Challenges: If you detect that a user is making an abnormally high number of requests, you can introduce a CAPTCHA challenge or other bot-detection mechanisms to ensure the request is coming from a human.
- Rate limit on specific actions: Apply stricter rate limiting for specific actions that are prone to bot abuse, such as:
- Login Attempts: Prevent brute-force attacks by limiting the number of login attempts from a single IP or account.
- Form Submissions: Limit submissions of forms (e.g., sign-up, contact forms) to prevent spam.
- Content Scraping: If your website contains valuable data (e.g., pricing, product listings), you can limit the frequency of data scraping by implementing rate limits on content pages.
- Use middleware: Implement rate limiting logic in your web application framework for centralized, consistent control.
- Use Web Application Firewalls (WAFs): A WAF can automatically apply rate-limiting rules to protect your website from bot traffic. Some WAFs include built-in bot protection features that can identify and block malicious bots in real-time.
- Logging and alerts: Log requests that exceed rate limits and monitor for abnormal activity. If a bot is consistently hitting your rate limits, you can analyze the patterns and take further action, such as blocking that IP permanently or adding it to a blacklist.
Use CAPTCHA challenges
CAPTCHAs (short for Completely Automated Public Turing test to tell Computers and Humans Apart) are designed to distinguish bots from human users by requiring tasks that are difficult for automated scripts to complete.Here’s how you can use CAPTCHA to stop bots:
- Select a CAPTCHA provider: Google reCAPTCHA is the most common and widely used. Alternatives include hCaptcha, FunCaptcha, and Solve Media.
- Integrate CAPTCHA on forms or login pages: Insert CAPTCHA challenges into parts of your application where you suspect automated abuse. The most typical places to use CAPTCHA are:
- Sign-up or login forms: To prevent bots from automating the account creation or login process.
- Contact forms: To ensure messages are sent by humans.
- Comment sections: To avoid spam from bots posting automated comments.
- Checkout pages: To prevent bots from making fraudulent purchases.
- Verify CAPTCHA responses: When the form is submitted, you will need to verify the CAPTCHA response in your server-side code using the secret key, to protect against spoofing. The user’s response will be sent to the provider (e.g., Google) for verification.
Implement IP blacklisting
When you identify addresses used by malicious bots, consider blacklisting them so they can’t reach your infrastructure. This can be done either on your server (e.g., via web server configurations) or by using firewalls and security services.
You can add IPs to your blacklist manually, dynamically, or use IP blacklisting services.
Manual blacklisting
If you use Apache as your web server, you can block an IP using the .htaccess
file.
If you're using Nginx, you can block an IP by modifying the Nginx configuration file (nginx.conf
or a domain-specific configuration file).
If you're managing the server yourself, you can use firewall rules to block IP addresses.
- Using
iptables
(Linux):sudo iptables -A INPUT -s 123.45.67.89 -j DROP
- Using
ufw
(Ubuntu Firewall):sudo ufw deny from 123.45.67.89
Dynamic blacklisting
You can use logging or real-time monitoring tools to block IPs that meet certain criteria automatically:
- Fail2Ban is a popular tool for blocking malicious IP addresses automatically by monitoring logs for signs of malicious activity. It’s commonly used for brute force attack prevention, but it can be configured to block bots or other suspicious activities.
- Web Application Firewalls (WAF) like Cloudflare, Sucuri, and Imperva provide IP blacklisting features that block malicious traffic, including bots, before it even reaches your server. They can analyze traffic patterns, monitor suspicious behavior, and block IPs automatically.
IP blacklisting services
You can integrate third-party IP blacklisting services or bot management tools that maintain databases of known malicious IP addresses. These services automatically detect and block bots and malicious IPs in real time. Some examples are:
- Project Honey Pot: A free service that tracks malicious IPs and helps you block bots.
- Bot Detection Services: Providers like PerimeterX, Distil Networks, and DataDome offer advanced bot detection and IP blacklisting solutions.
Common pitfalls and solutions
- Don’t block legitimate users: Use adaptive rate limiting and carefully tuned CAPTCHA challenges so you don’t hinder genuine traffic.
- Don’t ignore mobile traffic: Many bots target mobile endpoints to exploit gaps in your detection strategy. Monitor all traffic sources.
- Keep your blacklist up-to-date: Inactive or stale blacklists lose effectiveness. Keep them current with the latest threat intelligence.
- Combine methods for best results: While single methods (like rate limiting or honeypots) can be effective, they should be used alongside other techniques. Stopping bots requires a multi-layered defense that combines rate limiting, CAPTCHA challenges, bot detection tools, traffic monitoring, IP blacklisting, and honeypots.
Stop bots with WorkOS Radar
With WorkOS Radar, you can detect, verify, and block harmful behavior in real time. Radar protects your app against AI bots, account abuse, credential theft, and more, using security insights from various sources, including data on user activity, network traffic, and other suspicious patterns.
Some of the features WorkOS Radar offers include:
- Bot detection: WorkOS Radar can determine whether an authentication is coming from a bot and allow or deny that attempt, even if the credentials are correct.
- Anomaly detection: WorkOS Radar can alert you if traffic spikes or if certain IP addresses show abnormal activity, which could indicate bot-driven attacks like scraping or credential stuffing.
- Credential stuffing prevention: WorkOS Radar can detect suspicious login activity, a common bot technique used in attacks. Bots may make a large number of login attempts in a short time using stolen credentials. WorkOS Radar will notice when a single client or device repeatedly signs in to your app and blocks these attempts for a short period of time.
- Impossible travel: By tracking device geolocation, WorkOS Radar can block or alert when subsequent authentication requests are spread around the globe.
- Device fingerprinting: Instead of focusing solely on IP addresses, Radar also identifies clients through device fingerprinting. This creates a persistent identifier that follows an attacker even when they change IPs.
- Progressive rate limiting: Rather than using fixed thresholds, Radar implements progressive rate limiting that becomes stricter as suspicious behavior continues. Initial authentication attempts proceed normally, but Radar issues challenges or complete blocks as failed attempts accumulate. These limits apply to the device fingerprint, not just the IP address. This means an attacker can't reset their limit by simply switching IPs.
- Custom rules: Developers can set custom rules to allow or deny authentication to specific devices, users, domains, or IP ranges. This enables many use cases, such as restricting sign-ins to a corporate IP range or allowing certain users to bypass false positive detections.