Automating structured product intelligence for circular retail
This project is a high-performance Proof of Concept (POC) supporting Decathlon’s Circular PIM (Product Information Management) initiative.
Its goal is to automatically transform unstructured cycling product listings from global marketplaces into validated, PIM-ready data, enabling repair, rental, and second-hand services at scale.
The system acts as a core intelligence layer for sustainable, circular business models.
- Web scraping using Requests and BeautifulSoup
- Text normalization to prepare raw listings for LLM processing
- Local inference with Llama 3.2 via Ollama (GDPR-safe, offline)
- Zero-shot extraction of product attributes:
- Brand
- Model
- Condition
- Price
- Material
- Strict JSON schema enforcement for deterministic, enterprise-ready output
- Pydantic models enforcing types and business rules (e.g. circular pricing logic)
- Automated Pytest suite to guarantee data quality and pipeline stability
- Language: Python 3.11+
- GenAI: Llama 3.2, Ollama, Prompt Engineering, Agentic Workflows
- Web Scraping: Requests, BeautifulSoup
- Data & Validation: Pydantic, SQL, Data Modeling
- Quality & DevOps: Git, Pytest, GitHub Actions
- Python 3.11+
- Ollama installed locally
git clone https://github.com/codes-by-sethu/bike-pim-scraper.git
cd bike-pim-scraper
pip install -r requirements.txtollama pull llama3.2python main.pyThis project runs entirely on local LLM infrastructure, minimizing:
- Data leakage risk
- Cloud dependency
- Energy consumption from large-scale API calls
It is built with the belief that technology should accelerate circularity while protecting our global playing field.
- ✔ Modular, object-oriented architecture
- ✔ Deterministic JSON output for PIM ingestion
- ✔ Fully local, GDPR-compliant inference
- ✔ Production-ready validation layer
Author: Sethulakshmi K B Focus: Circular Retail · Agentic AI · Product Intelligence