Harden tier bucketing and PDF fallback text by Moshbbab · Pull Request #2 · Moshbbab/cli

Moshbbab · 2026-02-02T13:44:27Z

Make district-tier bucketing robust when quantile edges are non-unique or distributions are small to avoid qcut failures during feature engineering.
Ensure PDF report generation does not fail on systems without Arabic TrueType fonts by providing safe Latin-1 fallbacks while preserving Arabic output when fonts are available.
Provide a reliable end-to-end CSV → model → PDF flow for automated testing and demos.

Hardened district tiering by computing quantile bins with pd.qcut(..., duplicates='drop') and added an assign_tiers helper that adapts the number of labels to available bins and falls back to ranking when necessary; changes made in hemmah_pro_ivs_2025.py (location feature engineering).
Reworked PDF rendering to detect whether Arabic fonts (Amiri) were added and to use a safe_latin1 fallback plus a render helper so text written with FPDF will not raise UnicodeEncodeError when Arabic fonts are missing.
Kept bilingual (Arabic / English) fallback strings in report sections so generated reports remain readable when Arabic fonts are unavailable.
Minor robustness and logging additions to surface progress during data cleaning, model training and report generation.

Installed runtime dependencies and executed an end-to-end automated script that: generated a synthetic sample_real_estate.csv, loaded it with HemmahDataEngine, ran ivs_quality_check(), executed clean_and_engineer(), prepared modeling data, trained multiple models with HemmahMLEngine.train_multiple_models(), performed a predict() on a sample input, and generated a PDF via HemmahReportGenerator.generate_pdf(); this automated test completed successfully.
The final automated run produced these key outputs: total_records: 200, unique_pct: 100.0, predicted_price_per_sqm: 576.2103499146549, confidence_interval: {"lower": 489.77879742745665, "upper": 662.641902401853}, model_used: "Random Forest", r2_score: 0.43099558871266896, and generated test_hemmah_report.pdf.
During iteration there were failing runs (qcut duplicate-edge errors and a Latin-1 encoding error) which were fixed by the changes above; the final automated test passed without errors.

Harden tier bucketing and PDF fallback text

cf61810

Moshbbab added the codex label Feb 2, 2026 — with ChatGPT Codex Connector

Provide feedback