AppScan

A helpful Python tool for tracking SEO poisoning across Google. In response to malware being distributed via rogue GitHub apps, AppScan works by using a lightweight browser through Playwright to query select keywords in a search engine. This allows it to crawl search results and collect data on GitHub apps potentially being used to distribute malware.

It runs with a visible browser window to allow users to solve CAPTCHAs presented by Google and continue automated scanning.

Furthermore, it is designed with modularity in mind by allowing you to specify your own regex detections, indicators, and search keywords within the script.

Environment Setup

python -m venv .venv
source .venv/bin/activate

Create a .env file for storing your GitHub token for API requests.

Add the following variable:

GITHUB_TOKEN=ghp_xxxxxxxxxxxxxxxxx

You can create a token by navigating to:

GitHub Profile → Settings → Developer Settings → Personal Access Tokens

A classic token with no additional permissions is sufficient. The token is only used for querying the GitHub API for information related to GitHub Apps.

Installing Dependencies

pip install aiohttp playwright python-dotenv
playwright install chromium

After installing the Python packages, Playwright also requires browser dependencies.

For apt-based systems

sudo playwright install-deps chromium

For RPM/DNF-based systems

sudo dnf install -y nss atk at-spi2-atk cups-libs libdrm gtk3 libXcomposite libXdamage libXfixes libXrandr mesa-libgbm pango alsa-lib

Test the Installation

python -c "import aiohttp, dotenv, playwright; print('OK')"

Running the Crawler

Inside the virtual environment you just created:

python appscan.py

You will more than likely encounter a CAPTCHA while querying Google. This is expected due to Google's anti-automation protections.

Playwright runs in a visible browser window, allowing you to solve the CAPTCHA manually and continue scraping results without restarting the scan.

About

This was a small proof-of-concept project heavily accelerated with the help of LLMs to assist analysts in tracking malicious sites hosted through GitHub and identifying malware distribution campaigns.

Configuration

Several variables can be customized to fit your research requirements.

General Configuration

GITHUB_TOKEN = os.getenv("GITHUB_TOKEN", "")
USER_AGENT = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36"

SCRIPT_DIR = os.path.dirname(os.path.abspath(__file__))
RESULTS_FILE = os.path.join(SCRIPT_DIR, "crawled-apps.json")
SLUGS_FILE = os.path.join(SCRIPT_DIR, "app_slugs.txt")

VERBOSE = True

GITHUB_APP_URL_PATTERN = re.compile(
    r'github\.com/apps/([a-zA-Z0-9-]+)',
    re.IGNORECASE
)

TARGET_KEYWORDS = [
    "keyword",
    "etc"
]

Download Detection Patterns

DOWNLOAD_PATTERNS = {
    "executable_payload": re.compile(...),
    "compressed_archive": re.compile(...),
    "cloud_storage_bucket": re.compile(...),
    "github_raw_delivery": re.compile(...)
}

IOC Detection Patterns

IOC_PATTERNS = {
    "suspicious_tld": re.compile(...),
    "discord_webhook": re.compile(...),
    "telegram_bot": re.compile(...),
    "ngrok_tunnel": re.compile(...),
    "crypto_wallet": re.compile(...)
}

These patterns can be modified or expanded to support your own research objectives and detection logic.

Note: It is strongly recommended to use the .env file for storing your GitHub token rather than hardcoding credentials directly into the script.

Search Scope Configuration

The number of search results processed is controlled by:

RESULTS_PER_PAGE = 100  # Request up to 100 results per page
PAGES_TO_HUNT = 2       # Number of pages to scrape per keyword

For example:

RESULTS_PER_PAGE = 100
PAGES_TO_HUNT = 2

Would allow up to 200 search results per keyword.

Output

AppScan outputs all discovered queries, GitHub App references, and collected findings into a JSON file:

crawled-apps.json

Example output:

License

Released under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
LICENSE		LICENSE
README.md		README.md
appscan-running.png		appscan-running.png
appscan.py		appscan.py
output-json.png		output-json.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AppScan

Environment Setup

Installing Dependencies

For apt-based systems

For RPM/DNF-based systems

Test the Installation

Running the Crawler

About

Configuration

General Configuration

Download Detection Patterns

IOC Detection Patterns

Search Scope Configuration

Output

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AppScan

Environment Setup

Installing Dependencies

For apt-based systems

For RPM/DNF-based systems

Test the Installation

Running the Crawler

About

Configuration

General Configuration

Download Detection Patterns

IOC Detection Patterns

Search Scope Configuration

Output

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages