Skip to content

jhamer8/s1-complexity-index

Repository files navigation

Complexity Index Analysis

A system for analyzing S-1 IPO filings to compute complexity metrics and correlate them with financial performance.

Architecture

The codebase follows a modular architecture with clear separation of concerns:

Core Modules

1. extract_risk_factors.py

Purpose: Extract risk factors from S-1 filings and calculate financial metrics.

Class: S1FilingAnalyzer

  • Responsibilities:
    • Extract risk factor sections from SEC filings
    • Calculate post-IPO financial metrics (returns, volatility)
    • No dependencies on other project modules

Key Methods:

  • extract_risk_factors(filing_url): Extract risk factor text
  • calculate_returns(ticker, ipo_date): Calculate financial metrics
  • analyze_filing(...): Analyze a single filing
  • analyze_filings(...): Analyze multiple filings

2. calc_ci_score.py

Purpose: Calculate complexity metrics from text.

Class: CICalculator

  • Responsibilities:
    • Calculate Gunning Fog readability score
    • Calculate financial jargon density using FinBERT vs BERT embeddings
    • Compute Complexity Index (CI = Fog Score + α × Jargon Density)
    • No dependencies on other project modules

Key Methods:

  • calculate_fog_score(): Calculate Gunning Fog score
  • calculate_jargon_density(): Calculate jargon density
  • calculate_ci_score(): Calculate complete CI score

3. main.py

Purpose: Orchestrate analysis and perform correlation studies.

Class: FilingAnalysisOrchestrator

  • Responsibilities:
    • Coordinate both S1FilingAnalyzer and CICalculator
    • Combine results from both modules
    • Perform correlation analysis between complexity and financial metrics
    • Generate reports and summaries

Key Methods:

  • analyze_filings(): Full analysis pipeline
  • analyze_correlations(): Correlation analysis
  • get_results_dataframe(): Convert results to pandas DataFrame
  • print_summary(): Print formatted summary
  • save_results(): Save to JSON

Design Principles

  1. Separation of Concerns: Each module has a single, well-defined responsibility
  2. Independence: Core modules (extract_risk_factors.py and calc_ci_score.py) can work independently
  3. Composition: main.py composes the functionality of both modules
  4. Dependency Inversion: Orchestration layer depends on abstractions (classes), not implementations

Usage

Basic Usage

from extract_risk_factors import S1FilingAnalyzer
from calc_ci_score import CICalculator

# Extract risk factors and financial metrics
analyzer = S1FilingAnalyzer()
filings_data = [{"filing_url": "...", "ticker": "TSLA", "ipo_date": "2010-06-29"}]
results = analyzer.analyze_filings(filings_data)

# Calculate complexity metrics
for company, data in results.items():
    calculator = CICalculator(data['risk_factors'])
    ci_score = calculator.calculate_ci_score()

Full Analysis with Orchestrator

from main import FilingAnalysisOrchestrator

orchestrator = FilingAnalysisOrchestrator(alpha=10.0)
filings_data = [...]  # Your filings
results = orchestrator.analyze_filings(filings_data)
orchestrator.print_summary()
orchestrator.print_correlation_analysis()

Command Line

python main.py

Output Structure

The orchestrator returns a dictionary with the following structure:

{
    "Company Name": {
        "company_name": str,
        "ticker": str,
        "ipo_date": str,
        "filing_url": str,
        "risk_factors": str,
        
        # Financial metrics
        "1_month_return": float | None,
        "6_month_return": float | None,
        "1_year_return": float | None,
        "volatility": float | None,
        
        # Complexity metrics
        "fog_score": float | None,
        "jargon_density": float | None,
        "ci_score": float | None,
        "alpha": float
    }
}

Correlation Analysis

The system automatically calculates correlations between:

  • Complexity Index vs Financial Returns
  • Fog Score vs Financial Returns
  • Jargon Density vs Financial Returns
  • All metrics vs Volatility

Requirements

See requirements.txt for full list. Key dependencies:

  • sec-api: SEC filing extraction
  • yfinance: Financial data
  • transformers: FinBERT/BERT models
  • torch: Deep learning framework
  • pandas: Data analysis
  • numpy: Numerical computations
  • textstat: Readability metrics

Environment Variables

  • SEC_API_KEY: Your SEC API key (required)

Example Workflow

  1. Extract Data: Use S1FilingAnalyzer to get risk factors and financial metrics
  2. Calculate Complexity: Use CICalculator to compute CI scores
  3. Analyze Correlations: Use FilingAnalysisOrchestrator to combine and analyze

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages