An advanced NLP pipeline using BERT, GPT, and custom NER models for market sentiment analysis, automated report generation, and financial document processing.
bert-gpt-analyzer/
├── data/
│ ├── raw/
│ └── processed/
├── models/
│ ├── sentiment_analysis/
│ ├── report_generation/
│ └── ner/
├── src/
│ ├── data_processing/
│ │ └── preprocess.py
│ ├── sentiment_analysis/
│ │ └── bert_sentiment.py
│ ├── report_generation/
│ │ └── gpt_report.py
│ ├── ner/
│ │ └── financial_ner.py
│ └── utils/
├── notebooks/
├── tests/
├── requirements.txt
└── README.md
-
Clone the repository:
git clone https://github.com/soroush-thr/bert-gpt-analyzer.git cd bert-gpt-analyzer -
Create a virtual environment (optional but recommended):
python -m venv venv source venv/bin/activate # On Windows, use `venv\Scripts\activate` -
Install the required packages:
pip install -r requirements.txtNote: If you encounter issues with PyTorch installation, visit https://pytorch.org/get-started/locally/ and follow the instructions for your specific system configuration.
Place your raw financial data in the data/raw/ directory.
Use the preprocess.py script to clean and tokenize your data:
from src.data_processing.preprocess import prepare_data
processed_data = prepare_data('data/raw/your_data.csv')
processed_data.to_csv('data/processed/processed_data.csv', index=False)Use the BERT-based sentiment analysis model:
from src.sentiment_analysis.bert_sentiment import BERTSentimentAnalyzer
import pandas as pd
# Load processed data
data = pd.read_csv('data/processed/processed_data.csv')
# Initialize and use the sentiment analyzer
analyzer = BERTSentimentAnalyzer()
data_with_sentiment = analyzer.analyze_sentiment(data)
# Save results
data_with_sentiment.to_csv('data/processed/data_with_sentiment.csv', index=False)Generate reports based on sentiment analysis results:
from src.report_generation.gpt_report import GPTReportGenerator
import pandas as pd
# Load data with sentiment
data = pd.read_csv('data/processed/data_with_sentiment.csv')
# Initialize and use the report generator
generator = GPTReportGenerator()
report = generator.summarize_sentiment(data)
print(report)Train and use a custom NER model for financial document processing:
from src.ner.financial_ner import create_training_data, train_ner_model, FinancialNER
# Prepare your labeled data
labeled_data = [
("Apple Inc. reported revenue of $100 billion in Q4 2023", [(0, 10, "ORG"), (31, 43, "MONEY"), (47, 55, "DATE")])
# Add more labeled examples...
]
# Create training data
train_data = create_training_data(labeled_data)
# Train the model
train_ner_model(train_data, 'models/ner/financial_ner_model')
# Use the trained model
ner_model = FinancialNER('models/ner/financial_ner_model')
entities = ner_model.extract_entities("Microsoft's stock price reached $300 on January 15, 2024")
print(entities)Contributions to this project are welcome. Please follow these steps:
- Fork the repository
- Create a new branch (
git checkout -b feature/AmazingFeature) - Make your changes
- Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
Distributed under the MIT License. See LICENSE for more information.
Soroush Taheri - soroush.thr@gmail.com
Project Link: https://github.com/soroush-thr/bert-gpt-analyzer