A Node.js/TypeScript service that scrapes product prices and promotions from the Colruyt API, stores them in a PostgreSQL database, and exposes a REST API for querying the data. A daily cron job automatically fetches new data and detects price changes.
- Features
- Prerequisites
- Installation
- Configuration
- Running the Service
- API Reference
- Testing
- Docker
- Project Structure
- Contributing
- Scrapes product data and promotions from the Colruyt API
- Stores historical prices and promotions in a PostgreSQL database
- Detects and records price changes (increases, decreases, and promotions)
- REST API endpoints for products and promotions
- Interactive Swagger UI at
/api/docs - Daily cron job (runs at 08:00) to keep data up to date
- Configurable start modes: API server, scrape+compare, or compare-only-at-startup
- Node.js v20+
- pnpm v9+ (
npm install -g pnpm) - PostgreSQL database
-
Clone the repository
git clone https://github.com/Merilairon/colruyt-scraper.git cd colruyt-scraper -
Install dependencies
pnpm install
-
Set up environment variables
Copy the example file and fill in your values:
cp .env.example .env
See the Configuration section for details on each variable.
All configuration is done via environment variables. Create a .env file in the project root (use .env.example as a template):
| Variable | Description |
|---|---|
PROXY_ENDPOINT |
Comma-separated proxy server URLs (e.g. http://user:pass@host1:port,http://user:pass@host2:port) |
ENABLE_PROXY |
Set to true to route requests through the configured proxy/proxies |
HOST_URL |
Base URL of the Colruyt website |
API_HOST_URL |
Base URL for the Colruyt API host |
API_URL |
Endpoint for general product/price API calls |
PROMOTION_URL |
Endpoint for fetching promotion data |
PRODUCT_URL |
Endpoint for fetching individual product details |
PG_HOST |
PostgreSQL connection string (e.g. postgres://user:pass@localhost:5432/dbname) |
PLACE_ID |
Colruyt store place ID used when querying the API |
AMOUNT_OF_DAYS_KEPT |
Number of days of historical price data to retain in the database |
START_MODE |
Startup mode: SCRAPE (scrape + compare), COMPARE (compare only), or empty (API server only) |
pnpm devNodemon watches src/ and prisma/ for changes, then runs the build and start scripts automatically.
pnpm build # compile TypeScript → dist/
pnpm start # run compiled outputThe API server starts on port 3000 by default (override with the PORT environment variable).
Control the service behaviour via the START_MODE variable in .env:
START_MODE |
Behaviour |
|---|---|
| (empty) | Start the API server; the daily cron remains enabled and runs scrapeAndCompare() at 08:00 |
SCRAPE |
Start the API server, immediately run the scraper + price comparer, and keep the daily cron enabled |
COMPARE |
Start the API server, immediately run the price comparer only, and keep the daily cron enabled |
Note: the scheduled daily cron job runs scrapeAndCompare() in all modes, including START_MODE=COMPARE.
Interactive documentation (Swagger UI) is available at:
http://localhost:3000/api/docs
| Method | Path | Description |
|---|---|---|
GET |
/api/products |
List all scraped products |
GET |
/api/products/changes |
List detected product price changes |
GET |
/api/products/:productId |
Get a single scraped product by ID |
GET |
/api/promotions |
List all active promotions |
GET |
/api/promotions/:promotionId |
Get a single promotion by ID |
Paths not handled by the API return 404 Not Found.
Run the full test suite with:
pnpm testTests are written with Jest and Supertest and live in src/__tests__/. Jest is configured to look for files matching **/__tests__/**/*.test.ts.
To run a single test file:
pnpm test -- src/__tests__/comparer.test.tsA Dockerfile is included for containerised deployments.
docker build -t colruyt-scraper .docker run -p 3000:3000 --env-file .env colruyt-scraperThe container exposes port 3000. Pass your .env file with --env-file or use individual -e flags.
colruyt-scraper/
├── src/
│ ├── __tests__/ # Jest test files
│ ├── comparers/ # Price-change comparison logic
│ ├── docs/ # OpenAPI/Swagger YAML definitions
│ ├── models/ # Sequelize models (Product, Price, Promotion, …)
│ ├── routes/ # Express route handlers
│ ├── scrapers/ # Colruyt API scraping logic
│ ├── utils/ # Shared utility functions
│ ├── comparer.ts # Comparer entry point
│ ├── database.ts # Sequelize connection setup
│ ├── scraper.ts # Scraper entry point
│ └── server.ts # Express app & server entry point
├── .env.example # Environment variable template
├── Dockerfile # Container build instructions
├── nodemon.json # Nodemon watch configuration
├── package.json
├── tsconfig.json
└── README.md
- Fork the repository and create a feature branch (
git checkout -b feature/my-feature). - Make your changes and add tests where applicable.
- Ensure all tests pass (
pnpm test). - Open a pull request describing your changes.