Skip to content

AlosedAG/market-creation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Gemini Market Research Tool

Lightweight Python prototype for automated market research and competitive landscape analysis using Google's Gemini API (free tier).

⚠️ Prototype Notice: This tool is intentionally downgraded to work within Gemini's free API quota. Features are minimal, rate-limited, and designed to avoid exceeding usage limits.

Overview This tool helps you: Create Market Taxonomies - Automatically categorize and structure market segments with detailed feature definitions Discover Competitors - Find companies and products using Google Search grounding Analyze Websites - Extract product features, pricing, and capabilities from company websites

All outputs are saved as CSV files for easy analysis in Excel, Google Sheets, or other tools. Features

  1. Market Landscape Creation Generates market taxonomy with divisions and sub-divisions Creates 8-10 structured features with inclusion/exclusion keywords Uses Google Search to discover competitor companies and products Exports taxonomy and competitor lists to CSV

  2. Website Analysis Scrapes product websites using BeautifulSoup Extracts company/product names and descriptions Identifies specific features with boolean flags (✓/✗) Detects pricing tiers and models Truncates content to 8,000 characters to save token quota Exports analysis to CSV

Installation Prerequisites Python 3.8+ Google Gemini API key (free tier available at https://aistudio.google.com/apikey)

Setup Clone the repository: bashgit clone https://github.com/AlosedAG/gemini-market-research.git cd gemini-market-research

Install dependencies: pip install -r requirements.txt Required packages: google-genai beautifulsoup4 requests python-dotenv

Set up your Gemini API key: Option 1: Environment variable export GEMINI_API_KEY="your-api-key-here"

Option 2: Create a .env file echo "GEMINI_API_KEY=your-api-key-here" > .env

Usage Quick Start Run the main script: python main.py You'll be prompted to select a model (recommended: gemini-2.5-flash-lite for best quota management). Option 1: Create Market Landscape Select option: 1 Enter market topic: CRM software

Process: Generates market taxonomy with 8-10 features Each feature includes inclusion/exclusion keywords for classification Optionally saves taxonomy to CSV Uses Google Search to find 10 competitor companies Exports competitors with deep product URLs to CSV

Output Files: crm_software_taxonomy.csv - Feature definitions and keywords crm_software_competitors.csv - Company list with product URLs

Option 2: Analyze Company Website Select option: 2 Enter company product URL: https://www.salesforce.com/products/sales-cloud/ Process:

Scrapes website content (removes scripts, styles, nav, footer) Truncates to 8,000 characters to save tokens Analyzes for 5 predefined features: Mobile App API access SSO (Single Sign-On) Analytics Dashboard Webhooks

Exports feature flags and pricing data to CSV

Output File:

salesforce_analysis.csv - Complete product analysis

Project Structure gemini-market-research/ │ ├── main.py # Main entry point and CLI interface ├── core/ │ ├── init.py # Package initialization │ ├── config.py # API configuration with rate limiting │ ├── llm_handler.py # Gemini API wrapper with quota management │ ├── creator.py # Market taxonomy and competitor discovery │ └── updater.py # Website scraping and feature extraction │ ├── outputs/ # Generated CSV files (auto-created) ├── requirements.txt # Python dependencies ├── .env # API key configuration (you create this) └── README.md # This file

Key Components main.py - Interactive CLI with two modes: Mode 1: Taxonomy creation + competitor discovery Mode 2: Website analysis with feature detection

CSV export logic for all outputs config.py - API setup and rate limiting: 6-second delay between API calls (10 RPM max) Model selection without quota-wasting discovery calls Quota error handling and user guidance

llm_handler.py - LLM interaction engine: analyze_market() - Generates structured taxonomy search_and_analyze() - Uses Google Search tool for competitor discovery extract_product_data() - Analyzes website content for features Infinite patience retry logic for 429/503 errors (up to 15 attempts)

creator.py - Landscape creation: build_taxonomy() - Creates market structure with features find_competitors() - Searches for companies using Google Search grounding Forces deep product URLs (not homepages)

updater.py - Website analysis: scrape_website() - BeautifulSoup-based content extraction update_company() - Analyzes scraped content for specific features Content truncation to 8,000 chars

API Usage & Limitations Free Tier Constraints This prototype is intentionally downgraded to work within Gemini's free API tier:

Rate Limits: ~10-15 requests per minute (RPM) Daily Quota: ~1,500 requests per day Token Limits: Input truncated to 8,000 characters Delay: 6-second minimum between all API calls

Design Decisions for Free Usage To stay within limits, this tool: Includes: 6-second rate limiting between every API call Single-pass analysis (no refinement loops) Content truncation (8,000 char max vs. 15,000) Minimal retry logic (waits 65 seconds for quota reset) Direct model selection (skips list_models() call) Structured JSON outputs (reduces token usage)

Explicitly Removed: Batch processing Parallel requests Deep multi-page website analysis Iterative refinement Caching or optimization Extensive error recovery

Rate Limiting Mechanism python# From config.py _min_delay_seconds = 6.0 # Safely under 10 RPM

def rate_limit(): """Enforces 6-second minimum delay between API calls""" # Sleeps if less than 6 seconds since last call Every API call is preceded by rate_limit() to prevent quota exhaustion. Quota Errors If you see RESOURCE_EXHAUSTED or 429 errors:

Wait 65 seconds - The tool auto-retries with this delay Check daily limit - You may have hit 1,500 requests/day Reduce scope - Analyze fewer companies or smaller websites Upgrade - Consider paid tier for higher limits

Example Workflow Complete Market Research Flow bash# 1. Research the project management market $ python main.py

Select model: 1 # gemini-2.5-flash-lite Select option: 1 Enter market topic: project management software Save taxonomy? y Search for competitors? y

Output:

  • project_management_software_taxonomy.csv
  • project_management_software_competitors.csv
  1. Analyze top 3 competitors main.py

Select option: 2 Enter URL: https://asana.com/product

main.py

Select option: 2 Enter URL: https://monday.com/products/monday-work-management

main.py

Select option: 2 Enter URL: https://clickup.com/features

Output:

  • asana_analysis.csv
  • monday_analysis.csv
  • clickup_analysis.csv

Troubleshooting "Quota exceeded" / "RESOURCE_EXHAUSTED" Cause: Hit rate or daily limit Solution: Tool auto-waits 65 seconds. If persistent, wait 1 hour or try next day

"Invalid API Key" Cause: API key not configured Solution: Set GEMINI_API_KEY environment variable or create .env file

No Competitors Found Cause: Google Search returned no results or market too niche Solution: Try broader market category (e.g., "CRM" instead of "veterinary CRM for small practices")

Website Scraping Failed Cause: Website blocks scraping or requires JavaScript Solution: Try alternative product URL or manually copy content

Model Not Found Cause: Selected model name not available Solution: Choose gemini-2.5-flash-lite (most reliable for free tier)

Downgrade Specifications

This tool has been intentionally limited compared to what's technically possible This is a minimal prototype. Enhancement Ideas (Within Free Tier)

  • Add local JSON caching to avoid re-analyzing same URLs
  • Implement CSV merging for multi-competitor comparisons
  • Create data visualization from CSV outputs
  • Add more predefined feature templates

Enhancement Ideas (Require Paid Tier)

  • Parallel competitor analysis
  • Deep multi-page website scraping
  • Iterative taxonomy refinement
  • Real-time monitoring and alerts

License MIT License - See LICENSE file for details

Disclaimer This tool uses AI for analysis and may produce inaccurate results Always manually verify extracted data Not intended for production use without significant enhancements API quotas and rate limits are subject to change by Google Web scraping may be blocked by some websites

Support Gemini API Docs: https://ai.google.dev/docs Get API Key: https://aistudio.google.com/apikey Pricing Info: https://ai.google.dev/pricing

Built with Google Gemini API | Optimized for Free Tier Usage | Intentionally Downgraded for Zero Cost

About

market-creation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages