Skip to content

added RFC on how to create a living knowledge base of owasp things#734

Open
northdpole wants to merge 1 commit intomainfrom
owasp-graph
Open

added RFC on how to create a living knowledge base of owasp things#734
northdpole wants to merge 1 commit intomainfrom
owasp-graph

Conversation

@northdpole
Copy link
Collaborator

No description provided.

@PRAteek-singHWY
Copy link
Contributor

PRAteek-singHWY commented Feb 1, 2026

@northdpole
Thanks a lot for sharing this sir, this is extremely helpful and very well structured.

I've gone through the RFC and it gives a clear architectural and experimental framework to build the proposal around. I'll spend some time digesting it in detail and start aligning my work proposal with this design and the pre-code experiments outlined here.

@PRAteek-singHWY
Copy link
Contributor

@northdpole

Thanks for putting this together Sir, the experimental framework is really clear.

I’m particularly interested in Module C (The Librarian) and want to start with the suggested pre-code experiments before proposing any concrete design or implementation.

The negation problem stands out — I’ve worked on gap analysis features before (#716) and have seen how basic similarity metrics can struggle with logical inversions in requirements (e.g., “Use X” vs “Do NOT use X”).

Plan:
I’ll start with the ASVS re-classification experiment:

  • Extract 50 ASVS requirements and strip metadata
  • Baseline: vector search with cosine similarity
  • Comparison: cross-encoder re-ranking (ms-marco-MiniLM-L-6-v2)
  • Target: >20% accuracy improvement on negative requirements

If the experiment is successful, I’m also interested in exploring hybrid search (vector + BM25), especially for cases like CVE identifiers where pure vector search often underperforms.

I'll take this up step by step .

I’ll share experiment results and observations before proposing any implementation.

I’m using AI tools (similar to Cursor/Windsurf) and have read Section 3.

Thank you .

@manshusainishab
Copy link

Hi @northdpole ,

Thanks for putting together this RFC — the structure, pre-code experiments, and CI-first mindset are exactly the kind of system I enjoy working on.

I’d like to formally express my interest in owning Module B: Noise / Relevance Filter as my primary contribution, and I’m also happy to assist with adjacent modules where needed.

So Why Module B

The framing of Module B as a cheap, high-signal gate before expensive downstream processing resonates strongly with me. Getting this layer right feels critical to the quality, cost, and trustworthiness of the entire pipeline, especially given the planned regression dataset and CI enforcement.

Proposed Plan of Action (Aligned with the RFC)
I plan to follow the RFC strictly and start with experiments before any production code:

  1. Human Benchmark (Pre-Code Experiment)
    Manually label them as:
    Security Knowledge
    Noise (formatting, admin, linting, meta updates)
    This dataset will be versioned and reusable as an early “golden slice.”

  2. Prompt Iteration & Evaluation
    Start with a simple binary JSON output prompt:

“Is this content introducing or modifying security-relevant knowledge?”
Evaluate against the human benchmark.
Iterate until accuracy consistently exceeds 97%, with special attention to known failure modes (e.g., Code of Conduct updates, formatting-only diffs).

  1. Regex + LLM Cost Control
    Design the regex filter to aggressively eliminate obvious noise first (lockfiles, CSS, tests, config).
    Ensure the LLM is only invoked on borderline or content-heavy diffs.
    Document false positives / negatives clearly for future contributors.

  2. CI & Dataset Readiness
    Structure outputs so they can plug cleanly into the planned golden_dataset.json.
    Ensure behavior is deterministic and testable for CI regression checks.

And Cross-Module Contributions

While Module B would be my ownership area, I can also help with:
Module A: defining shared interfaces and assumptions between diff harvesting and filtering.
CI / Evaluation: contributing test cases and failure examples derived from Module B experiments.

I’ve read and understood Section 3 (Agent-Ready CI & AI-generated PR constraints) and I’m comfortable working within those boundaries.

Looking forward to collaborating — this project feels like a rare opportunity to build something both technically rigorous and genuinely useful.

Best,
Manshu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants