Summary
The current search experience on the RFD site is limited to literal string matches in titles (on the homepage), title + content search (available from the top right), and label‑based filtering has limitations as the body of work grows (60+ public RFDs, many more internally). A lightweight, hyper‑linked glossary would give users an entry point to discover related documents, understand domain‑specific terminology, and navigate the knowledge base more efficiently.
Motivation
| Problem |
Impact |
| Search and Filtering – Searches titles and RFD contents, but does not surface documents in a way that puts forth context in an opinionated way. |
Users spend more time locating information relevant to their immediate questions. |
| Label filtering – useful for small sets but becomes unwieldy with thousands of RFDs. |
Discoverability degrades as the volume of content increases. |
| Implicit term relationships – many RFDs reference each other without an explicit map. |
Contextual understanding of abstractions is low; newcomers face a steep learning curve. |
| Verification & veracity – existing references are not centrally curated, leading to stale or duplicated definitions. |
Incorrect, outdated, or overloaded definitions can propagate and cause confusion. |
A curated glossary could help to address these gaps by:
- Providing a single source of truth for definitions.
- Enabling hyper‑textual navigation (terms link to definitions and to every RFD that mentions them).
- Offering default filters (e.g., exclude abandoned RFDs, show only discussions with full context).
- Being lightweight and low‑cost to generate and maintain, especially when powered by LLM pipelines, which don't increase the existing site footprint.
Goals
- Ease of Use – Users can locate a term and jump directly to all relevant RFDs within two clicks.
- Improved Discovery – Increase the rate at which specific topics are found without expanding the UI complexity.
- Scalable Maintenance – Automate glossary population and updates as new RFDs are merged.
- Foundation for Future Features – Enable downstream enhancements such as ontology building, customer‑facing knowledge bases, and richer knowledge‑graph queries.
Proposed Solution
1. Glossary Data Model
Store each term as a discrete JSON‑LD file (or equivalent) following the schema below:
{
"id": int
"term": "string", // term name
"definition": "string", // what the term means
"mentions": [{ "number": int, "text": "string"},] // RFDs where this term comes up
"relatedTerms": [{"id": int, "term": "string"},] // Other terms in the glossary related to this one, or included in the definition
"references": [{"url": "string", "text": "string"},] // external links distinct from the RFD site that provide additional context
"public": bool // whether a term appears in public RFDs, and thus a public glossary.
}
These term files could be placed in a new glossary.d/ directory, and segmented by first letter (A/term.json, B/term.json, …) This could look something like this:
hermano /rfd-site glossary !? v16.20.1 17:04 ls -l ./glossary.d/ | head -n 5
drwxr-xr-x@ 11 hermano staff 352 Dec 29 15:45 A
drwxr-xr-x@ 9 hermano staff 288 Dec 29 15:45 B
drwxr-xr-x@ 9 hermano staff 288 Dec 29 15:45 C
drwxr-xr-x@ 10 hermano staff 320 Dec 29 15:45 D
hermano /rfd-site glossary !? v16.20.1 17:04 ls -l ./glossary.d/C
total 56
-rw-r--r--@ 1 hermano staff 434 Dec 29 16:16 cerberus.json
-rw-r--r--@ 1 hermano staff 568 Dec 29 16:16 cosmo.json
-rw-r--r--@ 1 hermano staff 316 Dec 29 16:43 cpld.json
-rw-r--r--@ 1 hermano staff 137 Dec 29 16:16 cpu.json
-rw-r--r--@ 1 hermano staff 139 Dec 29 16:41 cru.json
-rw-r--r--@ 1 hermano staff 340 Dec 29 16:59 crucible.json
-rw-r--r--@ 1 hermano staff 669 Dec 29 16:43 cubby.json
This has the benefit of avoiding possible merge conflicts as terms are updated with newly published RFDs, and could also enable future-state contextual enhancements like term highlighting (we'll get to that later) in subsequent improvements in a manner that's cheap and efficient.
2. Content Generation
- Initial population – Use the existing engineering glossary (internal) as seed data.
- Incremental updates – Implement a GitHub Action that runs on every PR affecting an RFD. The action:
- Invokes an LLM (e.g., Claude Sonnet 4.5) with the diff to extract new/updated terms.
- Updates the corresponding JSON files in
glossary.d/.
- Commits the changes back to the repository.
The RFD API (github.com/oxidecomputer/rfd-api) can be leveraged to retrieve full document text for context when generating definitions, and @plaidfinch has made a working example of this in an internal repository.
In line with our values as expressed in RFD 576 - Using LLMs at Oxide, we would want to take great care to ensure that we leverage the strength of LLMs to soak up written context, compared to its relatively weak writing abilities. To this end, we should rely on human approval of definitions to guarantee veracity. In other words, final definitions should be human-written.
3. Appearance
- Add a top‑level Glossary (e.g., in glossary.tsx) page to the RFD site that lists terms alphabetically.
- Each entry links to its JSON definition and to a list of RFDs where the term is mentioned.
This could look something like this:
- Textual highlighting for glossary terms within the body of RFD documents:
Risks & Challenges
- Completeness – Automatic extraction may miss nuanced terms or generate duplicate entries. Having a human in the loop for manual review of PR‑generated changes could help this.
- Performance – Loading large term lists on every page could affect load time. This could be mitigated with lazy loading per initial letter and caching of generated JSON files in
glossary.d.
- Maintenance Overhead – Ensuring the glossary stays in sync with evolving terminology. This could be helped by the aforementioned GitHub Actions runner for all RFD PRs; along with a potential periodic audit of terms and definitions.
- Security / Privacy – Some terms may reference internal components not intended for public consumption. The use of a
public flag in the schema to establish public and private versions of the glossary should address this.
Acceptance Criteria
Limitations
This does not a knowledge management framework make. While this could be an incremental step in helping staff and friends of Oxide to come up to draw deeper into the deep information resources that have been built up over time, this feature would strictly extend the existing features of the RFD static site.
An eventual target might be a more robust data layer for knowledge and customer context, but this should make for a lightweight, low cost model for relevant and effective information sharing. If this goes well, it could be extended into building other ontologies for system issues, customer engagements, etc, which at that point could probably use its own RFD!
Roadmap
| Phase |
Description |
| Phase 1 – MVP |
Static glossary page, JSON‑file storage, GitHub Action for updates, optional tooltip integration. |
| Phase 2 – Enrichment |
Add relatedTerms, external references, and richer metadata; introduce a searchable endpoint (e.g., /api/glossary). |
| Phase 3 – Term Highlighting |
Within individual RFD pages, employ a lightweight client‑side set of functions that a. Loads a minified glossary.json (or a.json, b.json, … on demand), and b. Replaces occurrences of known terms with hover‑over tooltips linking to the glossary entry. |
References
Summary
The current search experience on the RFD site is limited to literal string matches in titles (on the homepage), title + content search (available from the top right), and label‑based filtering has limitations as the body of work grows (60+ public RFDs, many more internally). A lightweight, hyper‑linked glossary would give users an entry point to discover related documents, understand domain‑specific terminology, and navigate the knowledge base more efficiently.
Motivation
A curated glossary could help to address these gaps by:
Goals
Proposed Solution
1. Glossary Data Model
Store each term as a discrete JSON‑LD file (or equivalent) following the schema below:
{ "id": int "term": "string", // term name "definition": "string", // what the term means "mentions": [{ "number": int, "text": "string"},] // RFDs where this term comes up "relatedTerms": [{"id": int, "term": "string"},] // Other terms in the glossary related to this one, or included in the definition "references": [{"url": "string", "text": "string"},] // external links distinct from the RFD site that provide additional context "public": bool // whether a term appears in public RFDs, and thus a public glossary. }These term files could be placed in a new
glossary.d/directory, and segmented by first letter (A/term.json,B/term.json, …) This could look something like this:This has the benefit of avoiding possible merge conflicts as terms are updated with newly published RFDs, and could also enable future-state contextual enhancements like term highlighting (we'll get to that later) in subsequent improvements in a manner that's cheap and efficient.
2. Content Generation
glossary.d/.The RFD API (
github.com/oxidecomputer/rfd-api) can be leveraged to retrieve full document text for context when generating definitions, and @plaidfinch has made a working example of this in an internal repository.In line with our values as expressed in RFD 576 - Using LLMs at Oxide, we would want to take great care to ensure that we leverage the strength of LLMs to soak up written context, compared to its relatively weak writing abilities. To this end, we should rely on human approval of definitions to guarantee veracity. In other words, final definitions should be human-written.
3. Appearance
This could look something like this:
Risks & Challenges
glossary.d.publicflag in the schema to establish public and private versions of the glossary should address this.Acceptance Criteria
/glossaryand lists terms alphabetically with links to definitions and mentions.glossary.d/directory exists with at least the initial seed terms (minimum 20 entries).Limitations
This does not a knowledge management framework make. While this could be an incremental step in helping staff and friends of Oxide to come up to draw deeper into the deep information resources that have been built up over time, this feature would strictly extend the existing features of the RFD static site.
An eventual target might be a more robust data layer for knowledge and customer context, but this should make for a lightweight, low cost model for relevant and effective information sharing. If this goes well, it could be extended into building other ontologies for system issues, customer engagements, etc, which at that point could probably use its own RFD!
Roadmap
relatedTerms, external references, and richer metadata; introduce a searchable endpoint (e.g.,/api/glossary).glossary.json(ora.json,b.json, … on demand), and b. Replaces occurrences of known terms with hover‑over tooltips linking to the glossary entry.References