RFD Glossary Feature

## Summary  

The current search experience on the RFD site is limited to literal string matches in titles (on the homepage), title + content search (available from the top right), and label‑based filtering has limitations as the body of work grows (60+ public RFDs, many more internally). A lightweight, hyper‑linked glossary would give users an entry point to discover related documents, understand domain‑specific terminology, and navigate the knowledge base more efficiently.  

---

## Motivation  

| Problem | Impact |
|---------|--------|
| **Search and Filtering** – Searches titles and RFD contents, but does not surface documents in a way that puts forth context in an opinionated way. | Users spend more time locating information relevant to their immediate questions. |
| **Label filtering** – useful for small sets but becomes unwieldy with thousands of RFDs. | Discoverability degrades as the volume of content increases. |
| **Implicit term relationships** – many RFDs reference each other without an explicit map. | Contextual understanding of abstractions is low; newcomers face a steep learning curve. |
| **Verification & veracity** – existing references are not centrally curated, leading to stale or duplicated definitions. | Incorrect, outdated, or overloaded definitions can propagate and cause confusion. |

A curated glossary could help to address these gaps by:  

* Providing a **single source of truth** for definitions.  
* Enabling **hyper‑textual navigation** (terms link to definitions and to every RFD that mentions them).  
* Offering **default filters** (e.g., exclude abandoned RFDs, show only discussions with full context).  
* Being **lightweight and low‑cost** to generate and maintain, especially when powered by LLM pipelines, which don't increase the existing site footprint.  

---

## Goals  

1. **Ease of Use** – Users can locate a term and jump directly to all relevant RFDs within two clicks.  
2. **Improved Discovery** – Increase the rate at which specific topics are found without expanding the UI complexity.  
3. **Scalable Maintenance** – Automate glossary population and updates as new RFDs are merged.  
4. **Foundation for Future Features** – Enable downstream enhancements such as ontology building, customer‑facing knowledge bases, and richer knowledge‑graph queries.  

---

## Proposed Solution

### 1. Glossary Data Model  
Store each term as a discrete [JSON‑LD](https://json-ld.org/primer/latest/) file (or equivalent) following the schema below:

```json
{
  "id": int 
  "term": "string", // term name
  "definition": "string", // what the term means
  "mentions": [{ "number": int, "text": "string"},]  // RFDs where this term comes up
  "relatedTerms": [{"id": int, "term": "string"},] // Other terms in the glossary related to this one, or included in the definition
  "references": [{"url": "string", "text": "string"},] // external links distinct from the RFD site that provide additional context
  "public": bool // whether a term appears in public RFDs, and thus a public glossary.
}
```

These term files could be placed in a new `glossary.d/` directory, and segmented by first letter (`A/term.json`, `B/term.json`, …)  This could look something like this:

```bash
hermano /rfd-site glossary !? v16.20.1 17:04 ls -l ./glossary.d/ | head -n 5
drwxr-xr-x@ 11 hermano staff 352 Dec 29 15:45 A
drwxr-xr-x@ 9 hermano staff 288 Dec 29 15:45 B
drwxr-xr-x@ 9 hermano staff 288 Dec 29 15:45 C
drwxr-xr-x@ 10 hermano staff 320 Dec 29 15:45 D

hermano /rfd-site glossary !? v16.20.1 17:04 ls -l ./glossary.d/C
total 56
-rw-r--r--@ 1 hermano staff 434 Dec 29 16:16 cerberus.json
-rw-r--r--@ 1 hermano staff 568 Dec 29 16:16 cosmo.json
-rw-r--r--@ 1 hermano staff 316 Dec 29 16:43 cpld.json
-rw-r--r--@ 1 hermano staff 137 Dec 29 16:16 cpu.json
-rw-r--r--@ 1 hermano staff 139 Dec 29 16:41 cru.json
-rw-r--r--@ 1 hermano staff 340 Dec 29 16:59 crucible.json
-rw-r--r--@ 1 hermano staff 669 Dec 29 16:43 cubby.json
	
```

This has the benefit of avoiding possible merge conflicts as terms are updated with newly published RFDs, and could also enable future-state contextual enhancements like term highlighting (we'll get to that later) in subsequent improvements in a manner that's cheap and efficient. 

### 2. Content Generation  

* **Initial population** – Use the existing engineering glossary (internal) as seed data.
* **Incremental updates** – Implement a GitHub Action that runs on every PR affecting an RFD. The action:  
  * Invokes an LLM (e.g., Claude Sonnet 4.5) with the diff to extract new/updated terms.  
  * Updates the corresponding JSON files in `glossary.d/`.  
  * Commits the changes back to the repository.  

The RFD API (`github.com/oxidecomputer/rfd-api`) can be leveraged to retrieve full document text for context when generating definitions, and @plaidfinch has made a working example of this in an internal repository. 

In line with our values as expressed in [RFD 576 - Using LLMs at Oxide](https://rfd.shared.oxide.computer/rfd/576), we would want to take great care to ensure that we leverage the strength of LLMs to soak up written context, compared to its relatively weak writing abilities. To this end, we should rely on human approval of definitions to guarantee veracity.  In other words, **final definitions should be human-written**.

### 3. Appearance

* Add a top‑level **Glossary** (e.g., in glossary.tsx) page to the RFD site that lists terms alphabetically.  
* Each entry links to its JSON definition and to a list of RFDs where the term is mentioned.  

This could look something like this:

<img width="1960" height="1414" alt="Image" src="https://github.com/user-attachments/assets/d95628ca-c12d-4f36-963d-23eb72311c55" />

* Textual highlighting for glossary terms within the body of RFD documents:

<img width="723" height="393" alt="Image" src="https://github.com/user-attachments/assets/3ec5de9d-7c32-43db-9144-88b494708c95" />

---

## Risks & Challenges  

1. **Completeness** – Automatic extraction may miss nuanced terms or generate duplicate entries. Having a human in the loop for manual review of PR‑generated changes could help this.
2. **Performance** – Loading large term lists on every page could affect load time. This could be mitigated with lazy loading per initial letter and caching of generated JSON files in `glossary.d`.
3. **Maintenance Overhead** – Ensuring the glossary stays in sync with evolving terminology. This could be helped by the aforementioned GitHub Actions runner for all RFD PRs; along with a potential periodic audit of terms and definitions. 
4. **Security / Privacy** – Some terms may reference internal components not intended for public consumption. The use of a `public` flag in the schema to establish public and private versions of the glossary should address this.

---

## Acceptance Criteria  

- [ ] A public **Glossary** page is reachable at `/glossary` and lists terms alphabetically with links to definitions and mentions.  
- [ ] A `glossary.d/` directory exists with at least the initial seed terms (minimum 20 entries).  
- [ ] A GitHub Action triggers on every RFD PR, extracts new terms, and updates the JSON files.  
- [ ] Hover‑over tooltips for known terms appear on RFD pages when the optional JS is enabled.   

---

## Limitations

This does not a knowledge management framework make. While this could be an incremental step in helping staff  and friends of Oxide to come up to draw deeper into the deep information resources that have been built up over time, this feature would strictly extend the existing features of the RFD static site.

An eventual target might be a more robust data layer for knowledge and customer context, but this should make for a lightweight, low cost model for relevant and effective information sharing. If this goes well, it could be extended into building other ontologies for system issues, customer engagements, etc, which at that point could probably use its own RFD!

---

## Roadmap  

| Phase | Description |
|-------|-------------|
| **Phase 1 – MVP** | Static glossary page, JSON‑file storage, GitHub Action for updates, optional tooltip integration. |
| **Phase 2 – Enrichment** | Add `relatedTerms`, external references, and richer metadata; introduce a searchable endpoint (e.g., `/api/glossary`). |
| **Phase 3 – Term Highlighting** | Within individual RFD pages, employ a lightweight client‑side set of functions that a. Loads a minified `glossary.json` (or `a.json`, `b.json`, … on demand), and b. Replaces occurrences of known terms with hover‑over tooltips linking to the glossary entry.  

---

## References  

* **RFD API** – <https://github.com/oxidecomputer/rfd-api>  
* **JSON‑LD Specification** – <https://www.w3.org/TR/json-ld/>  
* **GitHub Actions Documentation** – <https://docs.github.com/en/actions>  
* **Claude Skill for GitHub Integration** – <https://code.claude.com/docs/en/github-actions>  

--- 


Phase	Description
Phase 1 – MVP	Static glossary page, JSON‑file storage, GitHub Action for updates, optional tooltip integration.
Phase 2 – Enrichment	Add `relatedTerms`, external references, and richer metadata; introduce a searchable endpoint (e.g., `/api/glossary`).
Phase 3 – Term Highlighting	Within individual RFD pages, employ a lightweight client‑side set of functions that a. Loads a minified `glossary.json` (or `a.json`, `b.json`, … on demand), and b. Replaces occurrences of known terms with hover‑over tooltips linking to the glossary entry.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFD Glossary Feature #185

Summary

Motivation

Goals

Proposed Solution

1. Glossary Data Model

2. Content Generation

3. Appearance

Risks & Challenges

Acceptance Criteria

Limitations

Roadmap

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Problem	Impact
Search and Filtering – Searches titles and RFD contents, but does not surface documents in a way that puts forth context in an opinionated way.	Users spend more time locating information relevant to their immediate questions.
Label filtering – useful for small sets but becomes unwieldy with thousands of RFDs.	Discoverability degrades as the volume of content increases.
Implicit term relationships – many RFDs reference each other without an explicit map.	Contextual understanding of abstractions is low; newcomers face a steep learning curve.
Verification & veracity – existing references are not centrally curated, leading to stale or duplicated definitions.	Incorrect, outdated, or overloaded definitions can propagate and cause confusion.

RFD Glossary Feature #185

Description

Summary

Motivation

Goals

Proposed Solution

1. Glossary Data Model

2. Content Generation

3. Appearance

Risks & Challenges

Acceptance Criteria

Limitations

Roadmap

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions