This project contains a single script, github.py, that scans GitHub for popular, active TypeScript repositories and keeps only repos that look like Next.js + AI projects.
It then exports:
- matched repositories
- top contributors for each matched repo
- a ranked rollup of contributors across all matched repos
By default, the script:
- searches GitHub repositories with:
language:TypeScript- at least
300stars - pushed within the last
365days - not archived
- detects Next.js signals (
nextdependency, Next scripts, Next config, app/pages folders) - detects AI-related dependencies (for example
openai,ai,@ai-sdk/*,langchain,llamaindex) - keeps only repos that match both Next.js and AI signals
- fetches top contributors for each matched repo
- optionally enriches contributors with profile data (name, company, location, etc.)
- Python 3.9+
- A GitHub token in
GITHUB_TOKEN
The script exits immediately if GITHUB_TOKEN is missing.
From the project root:
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtSet your token:
export GITHUB_TOKEN="your_token_here"PowerShell:
$env:GITHUB_TOKEN="your_token_here"python3 github.pyWhen finished, it prints a summary and writes CSV files to the current directory.
repos.csv: one row per matched repository with detection signals and matched dependenciesrepo_contributors.csv: contributor rows per matched repository (with optional profile enrichment)top_users.csv: contributor leaderboard across all matched repositories
There are no CLI flags right now. Configure behavior by editing constants at the top of github.py.
Common settings:
MAX_REPOS: stop after this many qualifying reposMIN_STARS: minimum stars for searchMIN_LAST_PUSHED_DAYS: recency filter (in days)TOP_CONTRIBUTORS: contributors fetched per repoENRICH_CONTRIBUTORS: enable/disable user profile enrichmentEXCLUDE_BOT_ACCOUNTS: skip likely bot usersREQUIRE_NEXTJS: require Next.js signal to qualifyREQUIRE_AI_PACKAGES: require AI dependency signal to qualifyPACE_SECONDS: delay between API calls (helps with rate limits)
Output file names are also configurable:
OUT_REPOSOUT_CONTRIBSOUT_TOP_USERS
Missing GITHUB_TOKEN env var:- Set
GITHUB_TOKENin your shell before running.
- Set
- Very slow runs:
- This is expected when scanning many repos and enriching users.
- Try lowering
MAX_REPOSor settingENRICH_CONTRIBUTORS = False.
- Too few or zero matches:
- Lower
MIN_STARSand/or loosen filters by settingREQUIRE_NEXTJSorREQUIRE_AI_PACKAGEStoFalse.
- Lower
- The script uses GitHub REST API endpoints for search, repository tree/content checks, contributors, and user profiles.
- Some repository trees may be truncated by GitHub. The
tree_truncatedcolumn inrepos.csvflags this.