A system for analyzing S-1 IPO filings to compute complexity metrics and correlate them with financial performance.
The codebase follows a modular architecture with clear separation of concerns:
Purpose: Extract risk factors from S-1 filings and calculate financial metrics.
Class: S1FilingAnalyzer
- Responsibilities:
- Extract risk factor sections from SEC filings
- Calculate post-IPO financial metrics (returns, volatility)
- No dependencies on other project modules
Key Methods:
extract_risk_factors(filing_url): Extract risk factor textcalculate_returns(ticker, ipo_date): Calculate financial metricsanalyze_filing(...): Analyze a single filinganalyze_filings(...): Analyze multiple filings
Purpose: Calculate complexity metrics from text.
Class: CICalculator
- Responsibilities:
- Calculate Gunning Fog readability score
- Calculate financial jargon density using FinBERT vs BERT embeddings
- Compute Complexity Index (CI = Fog Score + α × Jargon Density)
- No dependencies on other project modules
Key Methods:
calculate_fog_score(): Calculate Gunning Fog scorecalculate_jargon_density(): Calculate jargon densitycalculate_ci_score(): Calculate complete CI score
Purpose: Orchestrate analysis and perform correlation studies.
Class: FilingAnalysisOrchestrator
- Responsibilities:
- Coordinate both
S1FilingAnalyzerandCICalculator - Combine results from both modules
- Perform correlation analysis between complexity and financial metrics
- Generate reports and summaries
- Coordinate both
Key Methods:
analyze_filings(): Full analysis pipelineanalyze_correlations(): Correlation analysisget_results_dataframe(): Convert results to pandas DataFrameprint_summary(): Print formatted summarysave_results(): Save to JSON
- Separation of Concerns: Each module has a single, well-defined responsibility
- Independence: Core modules (
extract_risk_factors.pyandcalc_ci_score.py) can work independently - Composition:
main.pycomposes the functionality of both modules - Dependency Inversion: Orchestration layer depends on abstractions (classes), not implementations
from extract_risk_factors import S1FilingAnalyzer
from calc_ci_score import CICalculator
# Extract risk factors and financial metrics
analyzer = S1FilingAnalyzer()
filings_data = [{"filing_url": "...", "ticker": "TSLA", "ipo_date": "2010-06-29"}]
results = analyzer.analyze_filings(filings_data)
# Calculate complexity metrics
for company, data in results.items():
calculator = CICalculator(data['risk_factors'])
ci_score = calculator.calculate_ci_score()from main import FilingAnalysisOrchestrator
orchestrator = FilingAnalysisOrchestrator(alpha=10.0)
filings_data = [...] # Your filings
results = orchestrator.analyze_filings(filings_data)
orchestrator.print_summary()
orchestrator.print_correlation_analysis()python main.pyThe orchestrator returns a dictionary with the following structure:
{
"Company Name": {
"company_name": str,
"ticker": str,
"ipo_date": str,
"filing_url": str,
"risk_factors": str,
# Financial metrics
"1_month_return": float | None,
"6_month_return": float | None,
"1_year_return": float | None,
"volatility": float | None,
# Complexity metrics
"fog_score": float | None,
"jargon_density": float | None,
"ci_score": float | None,
"alpha": float
}
}The system automatically calculates correlations between:
- Complexity Index vs Financial Returns
- Fog Score vs Financial Returns
- Jargon Density vs Financial Returns
- All metrics vs Volatility
See requirements.txt for full list. Key dependencies:
sec-api: SEC filing extractionyfinance: Financial datatransformers: FinBERT/BERT modelstorch: Deep learning frameworkpandas: Data analysisnumpy: Numerical computationstextstat: Readability metrics
SEC_API_KEY: Your SEC API key (required)
- Extract Data: Use
S1FilingAnalyzerto get risk factors and financial metrics - Calculate Complexity: Use
CICalculatorto compute CI scores - Analyze Correlations: Use
FilingAnalysisOrchestratorto combine and analyze