material-category-mapping-ai

Multilingual material category mapping system for procurement data. Designed to replace a fully manual classification process that didn't scale.

Background

~100K multilingual material master records being classified manually. Korean, English, Vietnamese, Chinese, Japanese — all mixed together, with no consistent classification standard across teams. Data quality was too low to use for any meaningful analysis.

Approach

Focus was on building something operationally sustainable, not just a one-off model.

Step 1 — Preprocessing Multilingual material name cleaning and feature extraction. Language detection → normalization → structured format for training.

Step 2 — Classification model

Multilingual embedding-based text similarity classification
Triplet Loss + Hard Negative Mining training architecture
Rule-based classification + AI recommendation hybrid architecture

Step 3 — Human-in-the-loop Designed so accuracy compounds as users give feedback. Not a static model — built to keep improving through operation.

Category structure

Level 1 (top category)
  └─ Level 2
       └─ Level 3
            └─ Level 4 (leaf category)

AI maps each material to the appropriate level within the hierarchy, maintaining structural consistency across the category tree.

Results

Dataset: ~100K multilingual material records
Classification accuracy: 80%+ within standardized category schema
Manual classification process replaced with automated pipeline
Human-in-the-loop design enables continuous accuracy improvement

Stack

Python PyTorch SentenceTransformers Multilingual Embeddings Triplet Loss Hard Negative Mining pandas scikit-learn

Structure

material-category-mapping-ai/
├── data/               # Sample data structure (no real data)
├── models/             # Model definitions
├── utils/              # Preprocessing utilities
├── train.py            # Training
├── inference.py        # Inference
├── config.py           # Configuration
└── requirements.txt

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
admin		admin
data		data
docs		docs
generate		generate
scripts		scripts
train		train
utils		utils
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
app.py		app.py
config.py		config.py
render.yaml		render.yaml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

material-category-mapping-ai

Background

Approach

Category structure

Results

Stack

Structure

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

material-category-mapping-ai

Background

Approach

Category structure

Results

Stack

Structure

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages