composer_cpm.py is a machine learning pipeline for classifying consumer complaints into specific categories (e.g., credit reporting, debt collection, consumer loans, mortgages).
The script:
- Loads a consumer complaint dataset (CSV)
- Maps product categories to predefined labels
- Cleans and processes complaint text
- Extracts features using TF-IDF (with unigrams and bigrams)
- Trains and compares two models:
- Logistic Regression
- Multinomial Naive Bayes
- Evaluates models using accuracy, F1-score, and confusion matrix
- Saves the best model and an evaluation report
- Provides a prediction function for new complaints
Install the dependencies before running the script:
pip install numpy pandas scikit-learn joblib matplotlib seaborn- You must provide a CSV file of consumer complaints.
- Update the script variable
CSV_PATHwith the path to your dataset:CSV_PATH = "C:/Users/Asus/Desktop/complaints.csv"
The script automatically detects a suitable text column. It prefers:
Consumer complaint narrativeconsumer_complaint_narrativecomplaint_what_happenedComplaint
If not found, it falls back toIssueorProduct.
The Product column is used to map complaints to one of 4 categories:
- 0 → Credit reporting, repair, or other
- 1 → Debt collection
- 2 → Consumer Loan
- 3 → Mortgage
Run the script directly:
python consumer_comp.pyIt will:
- Load and clean the data
- Train Logistic Regression and Naive Bayes models
- Compare their performance
- Save:
best_consumer_complaint_model.joblib→ the best modelevaluation_report.json→ evaluation metrics
-
Model performance (printed to console):
- Accuracy
- F1-macro score
- Classification report
- Confusion matrix
-
Saved files:
best_consumer_complaint_model.joblib(trained model)evaluation_report.json(evaluation metrics)
At the end of the run, the script tests a few sample complaints:
- Credit reporting, repair, or other -> I pulled my credit report and found an error ...
- Debt collection -> I received a call from a collector about a debt ...
- Consumer Loan -> I applied for a personal loan but the lender says ...
- Mortgage -> The mortgage servicer has been charging incorrect escrow ...
You can load the model and classify new texts using the provided function:
from composer_cpm import predict_texts
examples = [
"My credit report shows incorrect information.",
"A debt collector is harassing me."
]
predictions = predict_texts(examples)
print(predictions)
# ['Credit reporting, repair, or other', 'Debt collection']- Expand
LABEL_MAPto include more categories. - Experiment with different models (e.g., SVM, Random Forest).
- Deploy the model as an API (Flask/FastAPI).