Lightweight Python prototype for automated market research and competitive landscape analysis using Google's Gemini API (free tier).
Overview This tool helps you: Create Market Taxonomies - Automatically categorize and structure market segments with detailed feature definitions Discover Competitors - Find companies and products using Google Search grounding Analyze Websites - Extract product features, pricing, and capabilities from company websites
All outputs are saved as CSV files for easy analysis in Excel, Google Sheets, or other tools. Features
-
Market Landscape Creation Generates market taxonomy with divisions and sub-divisions Creates 8-10 structured features with inclusion/exclusion keywords Uses Google Search to discover competitor companies and products Exports taxonomy and competitor lists to CSV
-
Website Analysis Scrapes product websites using BeautifulSoup Extracts company/product names and descriptions Identifies specific features with boolean flags (✓/✗) Detects pricing tiers and models Truncates content to 8,000 characters to save token quota Exports analysis to CSV
Installation Prerequisites Python 3.8+ Google Gemini API key (free tier available at https://aistudio.google.com/apikey)
Setup Clone the repository: bashgit clone https://github.com/AlosedAG/gemini-market-research.git cd gemini-market-research
Install dependencies: pip install -r requirements.txt Required packages: google-genai beautifulsoup4 requests python-dotenv
Set up your Gemini API key: Option 1: Environment variable export GEMINI_API_KEY="your-api-key-here"
Option 2: Create a .env file echo "GEMINI_API_KEY=your-api-key-here" > .env
Usage Quick Start Run the main script: python main.py You'll be prompted to select a model (recommended: gemini-2.5-flash-lite for best quota management). Option 1: Create Market Landscape Select option: 1 Enter market topic: CRM software
Process: Generates market taxonomy with 8-10 features Each feature includes inclusion/exclusion keywords for classification Optionally saves taxonomy to CSV Uses Google Search to find 10 competitor companies Exports competitors with deep product URLs to CSV
Output Files: crm_software_taxonomy.csv - Feature definitions and keywords crm_software_competitors.csv - Company list with product URLs
Option 2: Analyze Company Website Select option: 2 Enter company product URL: https://www.salesforce.com/products/sales-cloud/ Process:
Scrapes website content (removes scripts, styles, nav, footer) Truncates to 8,000 characters to save tokens Analyzes for 5 predefined features: Mobile App API access SSO (Single Sign-On) Analytics Dashboard Webhooks
Exports feature flags and pricing data to CSV
Output File:
salesforce_analysis.csv - Complete product analysis
Project Structure gemini-market-research/ │ ├── main.py # Main entry point and CLI interface ├── core/ │ ├── init.py # Package initialization │ ├── config.py # API configuration with rate limiting │ ├── llm_handler.py # Gemini API wrapper with quota management │ ├── creator.py # Market taxonomy and competitor discovery │ └── updater.py # Website scraping and feature extraction │ ├── outputs/ # Generated CSV files (auto-created) ├── requirements.txt # Python dependencies ├── .env # API key configuration (you create this) └── README.md # This file
Key Components main.py - Interactive CLI with two modes: Mode 1: Taxonomy creation + competitor discovery Mode 2: Website analysis with feature detection
CSV export logic for all outputs config.py - API setup and rate limiting: 6-second delay between API calls (10 RPM max) Model selection without quota-wasting discovery calls Quota error handling and user guidance
llm_handler.py - LLM interaction engine: analyze_market() - Generates structured taxonomy search_and_analyze() - Uses Google Search tool for competitor discovery extract_product_data() - Analyzes website content for features Infinite patience retry logic for 429/503 errors (up to 15 attempts)
creator.py - Landscape creation: build_taxonomy() - Creates market structure with features find_competitors() - Searches for companies using Google Search grounding Forces deep product URLs (not homepages)
updater.py - Website analysis: scrape_website() - BeautifulSoup-based content extraction update_company() - Analyzes scraped content for specific features Content truncation to 8,000 chars
API Usage & Limitations Free Tier Constraints This prototype is intentionally downgraded to work within Gemini's free API tier:
Rate Limits: ~10-15 requests per minute (RPM) Daily Quota: ~1,500 requests per day Token Limits: Input truncated to 8,000 characters Delay: 6-second minimum between all API calls
Design Decisions for Free Usage To stay within limits, this tool: Includes: 6-second rate limiting between every API call Single-pass analysis (no refinement loops) Content truncation (8,000 char max vs. 15,000) Minimal retry logic (waits 65 seconds for quota reset) Direct model selection (skips list_models() call) Structured JSON outputs (reduces token usage)
Explicitly Removed: Batch processing Parallel requests Deep multi-page website analysis Iterative refinement Caching or optimization Extensive error recovery
Rate Limiting Mechanism python# From config.py _min_delay_seconds = 6.0 # Safely under 10 RPM
def rate_limit(): """Enforces 6-second minimum delay between API calls""" # Sleeps if less than 6 seconds since last call Every API call is preceded by rate_limit() to prevent quota exhaustion. Quota Errors If you see RESOURCE_EXHAUSTED or 429 errors:
Wait 65 seconds - The tool auto-retries with this delay Check daily limit - You may have hit 1,500 requests/day Reduce scope - Analyze fewer companies or smaller websites Upgrade - Consider paid tier for higher limits
Example Workflow Complete Market Research Flow bash# 1. Research the project management market $ python main.py
Select model: 1 # gemini-2.5-flash-lite Select option: 1 Enter market topic: project management software Save taxonomy? y Search for competitors? y
Output:
- project_management_software_taxonomy.csv
- project_management_software_competitors.csv
- Analyze top 3 competitors main.py
Select option: 2 Enter URL: https://asana.com/product
main.py
Select option: 2 Enter URL: https://monday.com/products/monday-work-management
main.py
Select option: 2 Enter URL: https://clickup.com/features
Output:
- asana_analysis.csv
- monday_analysis.csv
- clickup_analysis.csv
Troubleshooting "Quota exceeded" / "RESOURCE_EXHAUSTED" Cause: Hit rate or daily limit Solution: Tool auto-waits 65 seconds. If persistent, wait 1 hour or try next day
"Invalid API Key" Cause: API key not configured Solution: Set GEMINI_API_KEY environment variable or create .env file
No Competitors Found Cause: Google Search returned no results or market too niche Solution: Try broader market category (e.g., "CRM" instead of "veterinary CRM for small practices")
Website Scraping Failed Cause: Website blocks scraping or requires JavaScript Solution: Try alternative product URL or manually copy content
Model Not Found Cause: Selected model name not available Solution: Choose gemini-2.5-flash-lite (most reliable for free tier)
This tool has been intentionally limited compared to what's technically possible This is a minimal prototype. Enhancement Ideas (Within Free Tier)
- Add local JSON caching to avoid re-analyzing same URLs
- Implement CSV merging for multi-competitor comparisons
- Create data visualization from CSV outputs
- Add more predefined feature templates
Enhancement Ideas (Require Paid Tier)
- Parallel competitor analysis
- Deep multi-page website scraping
- Iterative taxonomy refinement
- Real-time monitoring and alerts
License MIT License - See LICENSE file for details
Disclaimer This tool uses AI for analysis and may produce inaccurate results Always manually verify extracted data Not intended for production use without significant enhancements API quotas and rate limits are subject to change by Google Web scraping may be blocked by some websites
Support Gemini API Docs: https://ai.google.dev/docs Get API Key: https://aistudio.google.com/apikey Pricing Info: https://ai.google.dev/pricing
Built with Google Gemini API | Optimized for Free Tier Usage | Intentionally Downgraded for Zero Cost