Solving AWS WAF CAPTCHA for Web Scraping

import requests
import re
import time

# Your CapSolver API Key
CAPSOLVER_API_KEY = "YOUR_CAPSOLVER_API_KEY"
CAPSOLVER_CREATE_TASK_ENDPOINT = "https://api.capsolver.com/createTask"
CAPSOLVER_GET_TASK_RESULT_ENDPOINT = "https://api.capsolver.com/getTaskResult"

# The URL of the website protected by AWS WAF
WEBSITE_URL = "https://efw47fpad9.execute-api.us-east-1.amazonaws.com/latest" # Example URL

def solve_aws_waf_captcha(website_url, capsolver_api_key):
    client = requests.Session()
    response = client.get(website_url)
    script_content = response.text

    key_match = re.search(r'"key":"([^"]+)"', script_content)
    iv_match = re.search(r'"iv":"([^"]+)"', script_content)
    context_match = re.search(r'"context":"([^"]+)"', script_content)
    jschallenge_match = re.search(r'<script.*?src="(.*?)".*?></script>', script_content)

    key = key_match.group(1) if key_match else None
    iv = iv_match.group(1) if iv_match else None
    context = context_match.group(1) if context_match else None
    jschallenge = jschallenge_match.group(1) if jschallenge_match else None

    if not all([key, iv, context, jschallenge]):
        print("Error: AWS WAF parameters not found in the page content.")
        return None

    task_payload = {
        "clientKey": capsolver_api_key,
        "task": {
            "type": "AntiAwsWafTaskProxyLess",
            "websiteURL": website_url,
            "awsKey": key,
            "awsIv": iv,
            "awsContext": context,
            "awsChallengeJS": jschallenge
        }
    }

    create_task_response = client.post(CAPSOLVER_CREATE_TASK_ENDPOINT, json=task_payload).json()
    task_id = create_task_response.get('taskId')

    if not task_id:
        print(f"Error creating CapSolver task: {create_task_response.get('errorId')}, {create_task_response.get('errorCode')}")
        return None

    print(f"CapSolver task created with ID: {task_id}")

    # Poll for task result
    for _ in range(10): # Try up to 10 times with 5-second intervals
        time.sleep(5)
        get_result_payload = {"clientKey": capsolver_api_key, "taskId": task_id}
        get_result_response = client.post(CAPSOLVER_GET_TASK_RESULT_ENDPOINT, json=get_result_payload).json()

        if get_result_response.get('status') == 'ready':
            aws_waf_token_cookie = get_result_response['solution']['cookie']
            print("CapSolver successfully solved the CAPTCHA.")
            return aws_waf_token_cookie
        elif get_result_response.get('status') == 'failed':
            print(f"CapSolver task failed: {get_result_response.get('errorId')}, {get_result_response.get('errorCode')}")
            return None

    print("CapSolver task timed out.")
    return None

# Example usage:
# aws_waf_token = solve_aws_waf_captcha(WEBSITE_URL, CAPSOLVER_API_KEY)
# if aws_waf_token:
#     print(f"Received AWS WAF Token: {aws_waf_token}")
#     # Use the token in your subsequent requests
#     final_response = requests.get(WEBSITE_URL, cookies={"aws-waf-token": aws_waf_token})
#     print(final_response.text)

Once you obtain the token, attach it to subsequent requests as a session cookie to maintain uninterrupted scraping.

4. Use Cases

Integrating an automated AWS CAPTCHA solver like CapSolver ensures uninterrupted and reliable data collection across a variety of development and analytics tasks.

Reliable Data Feeds for Machine Learning Maintain consistent training datasets by automatically bypassing CAPTCHA challenges. Ensure temporal continuity and improve model accuracy without manual intervention.

Continuous Market Intelligence Monitor competitor pricing, product availability, and promotions in real time. Prevent interruptions caused by AWS protections and maintain complete market visibility.

Consistent Business Intelligence Reporting Keep ETL pipelines and dashboards updated with accurate data. Avoid gaps and broken metrics caused by CAPTCHA blocks.

Scalable SEO and Marketing Analytics Collect keyword rankings, ad placements, and content metrics efficiently. Scale scraping operations without losing coverage due to AWS WAF protections.

Public Data and Research Collection Preserve reproducible datasets for academic or policy research. Eliminate manual CAPTCHA resolution and maintain regular updates across large-scale data sources.

5. Complementary Techniques to Handle AWS WAF

Proxy Rotation and User-Agent Management

AWS WAF flags repetitive patterns from a single IP or user-agent. Implementing proxy rotation and rotating browser identifiers help disguise automated traffic as organic user behavior.

Simulating Human Behavior

Use headless browsers (e.g., Selenium, Playwright) configured with:

Random mouse movements
Delays between clicks
Variable scrolling patterns

These small changes mimic human activity, reducing the likelihood of detection.

Cookie and Session Management

After passing a CAPTCHA, save and reuse cookies for persistent sessions. This prevents repeated CAPTCHA triggers on every new request.

Request Throttling

Throttle requests and introduce random delays. AWS WAF monitors activity rates, and consistent request intervals are a common red flag for bots.

HTTP Header Optimization

Match real browser headers (Accept-Language, Referer, Connection). Inconsistent or incomplete headers are often the easiest signal for AWS to block automated agents.

JavaScript Rendering and Fingerprinting Evasion

AWS WAF CAPTCHA relies on client-side JavaScript. Using headless browsers capable of executing JS—and modifying fingerprint identifiers like WebGL or screen resolution—can bypass this layer of defense.

6. Conclusion

Handling AWS WAF CAPTCHA effectively requires techniques like proxy rotation, user-agent rotation, session management, and human-like interaction. Automated CAPTCHA solvers, such as CapSolver, provide reliable token generation and integrate directly into scraping workflows. Using these methods helps maintain stable, uninterrupted data collection with minimal manual intervention.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Solving AWS WAF CAPTCHA for Web Scraping

Table of Contents

1. Introduction

2. Why We Encounter AWS WAF CAPTCHA

3. Bypass AWS WAF CAPTCHA Using CapSolver

Code Example (Python)

4. Use Cases

5. Complementary Techniques to Handle AWS WAF

Proxy Rotation and User-Agent Management

Simulating Human Behavior

Cookie and Session Management

Request Throttling

HTTP Header Optimization

JavaScript Rendering and Fingerprinting Evasion

6. Conclusion

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Solving AWS WAF CAPTCHA for Web Scraping

Table of Contents

1. Introduction

2. Why We Encounter AWS WAF CAPTCHA

3. Bypass AWS WAF CAPTCHA Using CapSolver

Code Example (Python)

4. Use Cases

5. Complementary Techniques to Handle AWS WAF

Proxy Rotation and User-Agent Management

Simulating Human Behavior

Cookie and Session Management

Request Throttling

HTTP Header Optimization

JavaScript Rendering and Fingerprinting Evasion

6. Conclusion

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages