Hotel Revenue ML Platform

🚀 Live Demo: Streamlit Dashboard | ⚙️ API Documentation: FastAPI

Forecasting and decision support for hotel revenue management cancellation risk, occupancy/ADR forecasting, dynamic pricing, and overbooking served as a FastAPI backend with a Streamlit front end.

Why I built this

Revenue management is a surprisingly rich ML problem and a good one to learn the unglamorous parts of the craft on. Cancellation prediction is a textbook leakage trap the public dataset hands you features that quietly encode the outcome, so a naive model scores beautifully and is worthless. Getting an honest number means knowing which columns you could actually have at booking time. On top of that, the prediction is only useful if the probability is calibrated and the decision threshold reflects real costs (a walked guest vs. an empty room), which is where most "0.9 AUC!" demos fall apart.

So the interesting work here isn't the model zoo it's the discipline around it: removing leakage and defending the lower AUC, calibrating, forecasting demand without leaking the future, and not over engineering the bits that don't need it (the "LP overbooking optimiser" turned out to be a one-line argmax). The decisions and the mistakes are written up in docs/decisions.md and docs/retrospective.md those are the most honest part of this repo.

⚠️ Mixed real + synthetic data — read this before trusting a metric. The headline cancellation model trains on a blend of two sources: the real Antonio, Almeida & Nunes (2019) Hotel Booking Demand dataset (119,390 rows, ~67%, fetched by src/load_real_data.py) and 60,000 synthetic bookings from src/generate_data.py. So the reported cancellation AUC is a genuine result on real bookings, not a simulation artifact, and the leakage remediation below is on real columns.

The forecasting series (daily KPIs, external regressors), the guest recommender's interaction matrix (rule-based), and the demo reviews are fully synthetic, so those metrics measure how well the models fit a simulation rather than real-world performance. It's a portfolio/reference pipeline; to run the synthetic parts on real operations, swap the generated CSVs for PMS/POS exports of the same schema.

Key Findings & Results

Cancellation Prediction: Achieved an honest Holdout AUC of 0.814 (Calibrated Brier Score: 0.163) using XGBoost on the real-majority blend (~67% Antonio et al., ~33% synthetic), evaluated on a single chronological hold-out (the most recent ~17% of rows).
Target Leakage Remediation: The base Antonio Almeida Nunes dataset contains deterministic leakage (booking_changes, days_in_waiting_list, and reservation_status). These features were explicitly dropped to ensure realistic bounds on production performance.
Occupancy Forecasting: Evaluated head to head, Prophet achieved an occupancy MAPE of ~15%, while a modern deep-learning N-BEATS baseline achieved ~15.9%.

For detailed evaluation metrics, confusion matrices, and calibration diagrams, see RESULTS.md and the Model Card.

Quick Start

Clone the repository:

git clone <repository-url>
cd hotel_enterprise

Set up the virtual environment and install dependencies:

python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
pip install -e .          # installs the backend/src packages (no sys.path hacks)

Run the Backend (FastAPI):

uvicorn backend.main:app --reload --port 8000

Run the Frontend (Streamlit):
```
streamlit run frontend/app.py
```

Architecture Diagram

graph TD
    subgraph Client_Layer
        A["Frontend: Streamlit"]
    end

    subgraph Service_Layer
        A -->|REST API| B["Backend: FastAPI"]
        B --> C["Sentiment: HuggingFace → Claude → TextBlob"]
        B --> D["Forecasting · Pricing · Cancellation · XAI"]
    end

    subgraph Data_Storage
        B --> E[("MLflow: experiment tracking + registry")]
        B --> F[("Artifacts: models/ + data/ (parquet/CSV)")]
    end

    style A fill:#f9f,stroke:#333,stroke-width:2px
    style B fill:#bbf,stroke:#333,stroke-width:2px
    style E fill:#dfd,stroke:#333,stroke-width:2px

Frontend: Streamlit dashboard providing interactive visualizations.
Backend: FastAPI microservice managing routing, forecasting, cancellation risk, dynamic pricing, and overbooking optimization.
MLflow Tracking: Tracks experiments, model parameters, and training metrics automatically.

Configuration

All optional — the defaults run an open local demo. Set these to harden a deploy:

Variable	Effect
`API_KEY`	Require an `X-API-Key` header on all `/api/v1/*` routes (`/health`, `/docs` stay public).
`CORS_ORIGINS`	Comma-separated allowed origins; defaults to `*`.
`ARTIFACTS_BUNDLE_URL` / `ARTIFACTS_BASE_URL`	Fetch models/data from an external store instead of the image see `ARTIFACTS.md`.
`MLFLOW_CANCELLATION_MODEL`	Serve the cancellation model from the MLflow registry instead of the local file.
`API_BASE` / `PUBLIC_API_URL` (frontend)	Where the dashboard calls / links to the API.

Validation & CI

The continuous integration suite actively enforces metric floors on the temporal holdout set:

holdout_auc > 0.80
holdout_mape < 0.16

The per-push CI runs lint + unit tests against the committed artifacts (fast); full data generation and model training run on a separate manual job. See .github/workflows/ci.yml.

Project docs

docs/decisions.md — why things are built the way they are (ADR-style log).
docs/retrospective.md — the bugs that taught me something (leakage, a dead threshold, OOM 502s).
docs/roadmap.md — what's next and what's deliberately out of scope.
ARTIFACTS.md — how models/data are resolved at serve time and how to move them out of git/the image.
REPRODUCIBILITY.md — seeds, repro steps, and known sources of non-determinism.
MODEL_CARD.md · RESULTS.md — model details and evaluation metrics.

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
.github/workflows		.github/workflows
.vscode		.vscode
backend		backend
data		data
docs		docs
frontend		frontend
lightning_logs		lightning_logs
mlruns		mlruns
models		models
monitoring		monitoring
scripts		scripts
src		src
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
ARTIFACTS.md		ARTIFACTS.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile.backend		Dockerfile.backend
Dockerfile.frontend		Dockerfile.frontend
LICENSE		LICENSE
MODEL_CARD.md		MODEL_CARD.md
README.md		README.md
REPRODUCIBILITY.md		REPRODUCIBILITY.md
RESULTS.md		RESULTS.md
bootstrap.py		bootstrap.py
docker-compose.yml		docker-compose.yml
metrics.json		metrics.json
pyproject.toml		pyproject.toml
render.yaml		render.yaml
requirements.frontend.txt		requirements.frontend.txt
requirements.prod.txt		requirements.prod.txt
requirements.txt		requirements.txt
run_classification.py		run_classification.py
runtime.txt		runtime.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hotel Revenue ML Platform

Why I built this

Key Findings & Results

Quick Start

Architecture Diagram

Configuration

Validation & CI

Project docs

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Hotel Revenue ML Platform

Why I built this

Key Findings & Results

Quick Start

Architecture Diagram

Configuration

Validation & CI

Project docs

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages