Skip to content

refactor: add weighted skill coverage scoring to recommendation engine#218

Open
Pranav-IIITM wants to merge 4 commits into
komalharshita:mainfrom
Pranav-IIITM:refactor/weighted-skill-coverage-scoring
Open

refactor: add weighted skill coverage scoring to recommendation engine#218
Pranav-IIITM wants to merge 4 commits into
komalharshita:mainfrom
Pranav-IIITM:refactor/weighted-skill-coverage-scoring

Conversation

@Pranav-IIITM
Copy link
Copy Markdown
Contributor

Summary

This PR is submitted as part of GSSoC 2026.

The recommendation engine previously awarded a flat 3 points per matching skill, meaning a user who knew 1 of 2 required skills scored the same as one who knew both. This PR introduces a coverage ratio so partial skill matches rank proportionally lower than full matches, producing more accurate project rankings.

Related Issue

Closes #98

Type of Change

  • Refactor — restructures code without changing behaviour
  • Test — adds or updates tests

What Was Changed

File Change made
utils/recommender.py Multiplied skill score by coverage ratio (matched / total project skills); added zero-division guard for projects with no skills
tests/test_basic.py Added four coverage-weighted unit tests; cleaned up duplicate imports
docs/architecture.md Updated scoring description and recommendation flow documentation to reflect the new weighted coverage formula

How to Test This PR

  1. Clone this branch:

    git checkout refactor/weighted-skill-coverage-scoring
  2. Install dependencies:

    pip install -r requirements.txt
  3. Run the tests:

    python tests/test_basic.py

Expected test output:

31 passed, 0 failed out of 31 tests

Test Results

31 passed, 0 failed out of 31 tests

Self-Review Checklist

  • I have read CONTRIBUTING.md and followed all guidelines
  • My branch name follows the convention: refactor/
  • I have run python tests/test_basic.py and all tests pass
  • I have run flake8 . locally and there are no errors
  • I have not introduced any print() or console.log() debug statements
  • Every new function I wrote has a docstring
  • I have not modified files outside the scope of the linked issue
  • If I changed the UI, I tested it at 375px (mobile) and 1280px (desktop)
  • If I added a project to the dataset, it has all required JSON fields

Notes for Reviewer

This refactor preserves the existing recommendation workflow while improving ranking accuracy for partial skill matches. Full matches continue to receive the highest weighting, while incomplete matches are now scaled proportionally using project skill coverage.

Fixes komalharshita#98

- utils/recommender.py: score now multiplied by coverage ratio (matched / total project skills) so partial skill matches rank lower than full matches
- tests/test_basic.py: added four coverage-weighted unit tests, cleaned up duplicate imports
- docs/architecture.md: updated scoring description to reflect new formula
@vercel
Copy link
Copy Markdown

vercel Bot commented May 17, 2026

@Pranav-IIITM is attempting to deploy a commit to the komalsony234-1530's projects Team on Vercel.

A member of the Team first needs to authorize it.

@Pranav-IIITM
Copy link
Copy Markdown
Contributor Author

Hi @komalharshita, could you please review this PR when you get a chance?

@Pranav-IIITM
Copy link
Copy Markdown
Contributor Author

@komalharshita Please review the PR

komalharshita
komalharshita previously approved these changes May 24, 2026
Copy link
Copy Markdown
Owner

@komalharshita komalharshita left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent contribution overall. This PR meaningfully improves recommendation quality by introducing coverage-weighted skill scoring instead of relying on flat skill-match counts.

The updated scoring model better differentiates between:

  • partial skill overlap
  • strong/full project alignment

which should produce more relevant recommendation ordering in practice.

What I especially appreciate:

  • strong targeted test coverage
  • edge-case handling (ZeroDivisionError)
  • architecture documentation updates
  • clean implementation scope without unrelated modifications

This is a well-structured backend improvement rather than a superficial scoring tweak.

A few future considerations (non-blocking):

  • the scoring system now returns floats instead of integers, so downstream formatting/assumptions should be verified
  • the current formula may heavily penalize projects with larger skill lists, which may need tuning over time
  • higher-level recommendation-ordering tests could be useful in the future

Overall, this is a strong and thoughtful improvement to the recommendation engine.

Approved for merge.

@komalharshita
Copy link
Copy Markdown
Owner

@Pranav-IIITM do ensure all checks pass

- Add WEIGHT_SKILL, WEIGHT_LEVEL, WEIGHT_INTEREST, WEIGHT_TIME constants
  to recommender.py to fix ImportError across all Python versions
- Implement coverage-weighted skill scoring with zero-division guard
- Update test assertions to use pytest.approx() for float-safe comparisons
- Fix coverage-isolation tests to prevent accidental criteria matching
- Fix test_health_check to use get_client() instead of missing fixture

Fixes CI failures on Python 3.9, 3.11, 3.12
All 34 tests passing

Related: komalharshita#218
@Pranav-IIITM
Copy link
Copy Markdown
Contributor Author

Hi @komalharshita! All CI checks are now passing

  • CI / Test (Python 3.9) — Successful
  • CI / Test (Python 3.11) — Successful
  • CI / Test (Python 3.12) — Successful
  • CI / Lint (flake8) — Successful
  • DevPath CI / build — Successful

The only remaining item is the Vercel check, which is blocked on
team authorization — that's on the deployment side, not the code.

Could you re-approve so we can get this merged?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Refactor recommendation engine to support multi-skill partial matching with weighted confidence scores

2 participants