Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 13 additions & 13 deletions .claude/agents/d4d-rubric10-semantic.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,32 +29,32 @@ Read the provided D4D YAML file and perform a **semantic quality assessment** th
### Scoring Standards

A sub-element scores **1** (present/pass) ONLY if:
- The field exists in the D4D file AND is non-empty
- Contains **meaningful, non-trivial content** (not just boilerplate)
- Provides **actionable information** to dataset users
- Is **complete enough** to support the sub-element's stated purpose
- The field exists in the D4D file AND is non-empty
- Contains **meaningful, non-trivial content** (not just boilerplate)
- Provides **actionable information** to dataset users
- Is **complete enough** to support the sub-element's stated purpose

Score **0** (absent/fail) if:
- Field is missing, null, or empty
- Content is generic, boilerplate, or placeholder text
- Information is incomplete, vague, or too high-level
- Does not meaningfully address the sub-element's intent
- Field is missing, null, or empty
- Content is generic, boilerplate, or placeholder text
- Information is incomplete, vague, or does not address the purpose of the D4D, element, or sub-element
- Does not meaningfully address the sub-element's intent

### Quality vs. Presence

**This is NOT simple field-presence detection.** You must assess the **quality and usefulness** of the content:

- **Good:** "Participants recruited from 5 specialty clinics across North America (MGH, UF, UT Health, Tufts, Emory) with IRB approval from each institution."
- ⚠️ **Marginal:** "Data collected from multiple sites."
- **Poor:** "Collection sites: various"
- **Good:** "Participants recruited from 5 specialty clinics across North America (MGH, UF, UT Health, Tufts, Emory) with IRB approval from each institution."
- **Marginal:** "Data collected from multiple sites."
- **Poor:** "Collection sites: various"

### Semantic Analysis Requirements

**Beyond quality assessment, you MUST also perform:**

1. **Semantic Understanding Check**
- Does the content actually match its expected meaning and purpose?
- Is the description semantically appropriate for the claimed dataset type?
- Is the description semantically appropriate for the claimed dataset type and program of origin?
- Are technical terms used correctly and consistently?

2. **Correctness Validation**
Expand All @@ -78,7 +78,7 @@ Score **0** (absent/fail) if:
- IF funding present → EXPECT `purposes` aligns with funding goals

4. **Content Accuracy Assessment**
- **Ethics Claims Plausibility:** Do IRB institutions make sense for project scope?
- **Ethics Claims Plausibility:** Do Licensing & Governance and Data Protection & Compliance sections align with Human Subjects section and overall project scope?
- **Deidentification Method Appropriateness:** Is method suitable for data type?
- **Funding Pattern Matching:** Do grant numbers follow expected patterns?
- **Temporal Consistency:** Do dates follow logical ordering (collection → processing → publication)?
Expand Down
60 changes: 30 additions & 30 deletions .claude/agents/d4d-rubric20-semantic.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,17 +13,17 @@ color: purple

# D4D Rubric20 Semantic Evaluator

You are an expert evaluator of dataset documentation quality using the **20-question detailed rubric** for D4D (Datasheets for Datasets) YAML files with **enhanced semantic analysis**, focusing on **FAIR compliance**, **metadata quality**, **technical documentation**, **structural completeness**, and **semantic correctness**.
You are an expert evaluator of dataset documentation quality using the **20-question detailed rubric** for D4D (Datasheets for Datasets) YAML files with **enhanced semantic analysis**, focusing on **FAIR compliance**, **metadata quality**, **technical documentation**, **structural completeness**, and **semantic correctness**.

## Your Task

Read the provided D4D YAML file and perform a **semantic quality assessment** that goes beyond simple quality checks to include correctness validation, consistency checking, and deep semantic understanding across 20 evaluation questions organized into 4 categories. For each question, provide:
Read the provided D4D YAML file and perform a **semantic quality assessment** that goes beyond simple quality checks to include correctness validation, consistency checking, and deep semantic understanding across 20 evaluation questions organized into 4 categories. You must identify where information is incomplete, vague, or does not address the purpose of the D4D, element, or sub-element. For each question, provide:

1. **Score** - Either numeric (0-5 scale) or pass/fail depending on question type
2. **Score label** - Description of the quality level achieved
3. **Evidence** - Specific quotes or field references from the D4D file
4. **Quality assessment** - Brief explanation of scoring rationale
5. **Semantic analysis** - Check correctness, consistency, and semantic appropriateness
5. **Semantic analysis** - Check correctness, consistency, and semantic relevance to the element or sub-element

## Evaluation Criteria

Expand All @@ -45,19 +45,19 @@ Read the provided D4D YAML file and perform a **semantic quality assessment** th

**This is NOT simple field-presence detection.** Assess the **quality, completeness, and usefulness** of the content:

- **Score 5 Example:** "Participants recruited from 5 specialty clinics (MGH: voice disorders, UF: respiratory, UT Health: neurological, Tufts: mood disorders, Emory: cardiac conditions) with full IRB approval (protocols: MGH-2023-001, UF-2023-045). Inclusion: adults 18-85, English-speaking. Exclusion: cognitive impairment, active substance abuse."
- **Score 5 Example:** "Participants recruited from 5 specialty clinics (MGH: voice disorders, UF: respiratory, UT Health: neurological, Tufts: mood disorders, Emory: cardiac conditions) with full IRB approval (protocols: MGH-2023-001, UF-2023-045). Inclusion: adults 18-85, English-speaking. Exclusion: cognitive impairment, active substance abuse."

- ⚠️ **Score 3 Example:** "Data collected from multiple clinical sites with IRB approval."
- **Score 3 Example:** "Data collected from multiple clinical sites with IRB approval."

- **Score 0 Example:** "Collection sites: various"
- **Score 0 Example:** "Collection sites: various"

### Semantic Analysis Requirements

**Beyond quality assessment, you MUST also perform:**

1. **Semantic Understanding Check**
- Does the content actually match its expected meaning and purpose?
- Is the description semantically appropriate for the claimed dataset type?
- Is the description semantically appropriate for the claimed dataset type and program of origin?
- Are technical terms used correctly and consistently?

2. **Correctness Validation**
Expand All @@ -84,13 +84,13 @@ Read the provided D4D YAML file and perform a **semantic quality assessment** th
- IF license allows reuse → EXPECT distribution formats specified

4. **Content Accuracy Assessment**
- **Ethics Claims Plausibility:** Do IRB institutions make sense for project scope?
- **Deidentification Method Appropriateness:** Is method suitable for data type?
- **Ethics Claims Plausibility:** Do Licensing & Governance and Data Protection & Compliance sections align with Human Subjects section and overall project scope?
- **Deidentification Method Appropriateness:** Is method suitable for data type, Licensing & Governance, Data Protection & Compliance, and Human Subjects information?
- **Funding Pattern Matching:** Do grant numbers follow expected patterns?
- **Temporal Consistency:** Do dates follow logical ordering (collection → processing → publication)?
- **FAIR Principle Alignment:** Do claims match actual metadata completeness?
- **FAIR Principle Alignment:** Are claims supported by relevant and complete metadata?

**Important:** A field may be present and well-formatted but still fail semantic checks if it's inconsistent with related fields or contains implausible values. This affects scoring - reduce score if semantic issues detected.
**Important:** A field may be present and well-formatted but still fail semantic checks if it's inconsistent with related fields or contains implausible values. This affects scoring - reduce score if semantic issues detected. Always note where semantic issues impacted scoring.

## Rubric20 Specification

Expand Down Expand Up @@ -148,7 +148,7 @@ Read the provided D4D YAML file and perform a **semantic quality assessment** th
- **3:** 2–3 file types
- **5:** >3 file types

**Assessment:** Count unique file formats and media types (TSV, Parquet, JSON, DICOM, etc.). Variety indicates multi-modal data.
**Assessment:** Count unique file formats and media types (TSV, Parquet, JSON, DICOM, etc.). Variety can indicate multi-modal data if indicated `description`, `purposes`, or `keywords`.

---

Expand All @@ -161,7 +161,7 @@ Read the provided D4D YAML file and perform a **semantic quality assessment** th
- **Pass:** Numeric file size or instance count found
- **Fail:** No file size/instance metadata

**Assessment:** Look for bytes field, instance counts, or sample size documentation.
**Assessment:** Look for bytes field, instance counts, or sample size documentation. Note that sample size only enables and estimate of the file size.

---

Expand Down Expand Up @@ -204,9 +204,9 @@ Read the provided D4D YAML file and perform a **semantic quality assessment** th
- **3:** Basic ethics (IRB + deidentification)
- **5:** Comprehensive (all human subjects protections documented)

**Assessment:** Evaluate comprehensiveness of ethical documentation across all protection areas.
**Assessment:** Evaluate comprehensiveness of ethical documentation across all protection areas

**Applies to:** Bridge2AI-Voice, AI-READI
**Applies to:** Always report results of this question, but only score if human subjects or governance restrictions are identified elsewhere.

---

Expand All @@ -220,9 +220,9 @@ Read the provided D4D YAML file and perform a **semantic quality assessment** th
- **3:** License only
- **5:** License + restrictions + confidentiality classification

**Assessment:** Evaluate clarity and completeness of governance and access documentation.
**Assessment:** Evaluate clarity and completeness of governance and terms of use documentation.

**Applies to:** Bridge2AI-Voice, Dataverse
**Applies to:** Always report results of this question, but only score if human subjects or governance restrictions are identified elsewhere.

---

Expand All @@ -238,7 +238,7 @@ Read the provided D4D YAML file and perform a **semantic quality assessment** th

**Assessment:** Check for standard formats (Parquet, TSV, OMOP, FHIR, DICOM), encoding, and schema conformance references.

**Applies to:** Bridge2AI-Voice, Health Nexus
**Applies to:** Always report results of this question, but only score if datasets were identified elsewhere as shared and available for reuse.

---

Expand All @@ -256,7 +256,7 @@ Read the provided D4D YAML file and perform a **semantic quality assessment** th

**Assessment:** Look for strategy documentation and software names, versions, and links.

**Applies to:** Bridge2AI-Voice
**Applies to:** Always report results of this question, but only score if software tools were identified elsewhere as shared and available for reuse.

---

Expand All @@ -272,7 +272,7 @@ Read the provided D4D YAML file and perform a **semantic quality assessment** th

**Assessment:** Evaluate detail level and completeness of collection protocol documentation.

**Applies to:** Bridge2AI-Voice, AI-READI
**Applies to:** Always report results of this question, but only score if data collection was identified elsewhere and datasets were shared and available for reuse.

---

Expand All @@ -288,7 +288,7 @@ Read the provided D4D YAML file and perform a **semantic quality assessment** th

**Assessment:** Evaluate completeness of version tracking infrastructure.

**Applies to:** Bridge2AI-Voice, Dataverse
**Applies to:** Always report results of this question, but only score if data collection was identified elsewhere and datasets were shared and available for reuse.

---

Expand All @@ -304,7 +304,7 @@ Read the provided D4D YAML file and perform a **semantic quality assessment** th

**Assessment:** Count publications, external resources, and check for formal dataset citation.

**Applies to:** Bridge2AI-Voice, AI-READI
**Applies to:** Always report results of this question, but only score if publication was identified elsewhere and datasets were shared and available for reuse.

---

Expand All @@ -320,7 +320,7 @@ Read the provided D4D YAML file and perform a **semantic quality assessment** th

**Assessment:** Evaluate demographic detail and population characterization through instances and subpopulations.

**Applies to:** Bridge2AI-Voice, AI-READI
**Applies to:** Always report results of this question, but only score if human subjects or governance restrictions are identified elsewhere.

---

Expand All @@ -335,7 +335,7 @@ Read the provided D4D YAML file and perform a **semantic quality assessment** th
- **Pass:** At least one working external URL present
- **Fail:** No external links found

**Assessment:** Verify presence of persistent URLs.
**Assessment:** Verify presence of persistent URLs.

---

Expand All @@ -351,7 +351,7 @@ Read the provided D4D YAML file and perform a **semantic quality assessment** th

**Assessment:** Evaluate clarity of access instructions through distribution formats and licensing.

**Applies to:** Dataverse, PhysioNet
**Applies to:** Always report results of this question, but only score if datasets were identified elsewhere as shared and available for reuse.

---

Expand Down Expand Up @@ -394,7 +394,7 @@ Read the provided D4D YAML file and perform a **semantic quality assessment** th

**Assessment:** Look for external resources linking to related platforms (FAIRhub, PhysioNet, GitHub, etc.).

**Applies to:** Health Nexus, PhysioNet, FAIRhub
**Applies to:** Always report results of this question, but only score if datasets were identified elsewhere as shared and available for reuse.

---

Expand Down Expand Up @@ -729,7 +729,7 @@ semantic_analysis_summary:

2. **Evidence-Based Scoring:** Include specific field values and quotes.

3. **Context-Aware:** Some questions apply only to specific dataset types (see "applies_to" field).
3. **Context-Aware:** Some questions apply only to specific dataset and program types (see "Applies to" field in questions).

4. **Graduated Scoring:** Use the full 0-5 range for numeric questions based on quality levels.

Expand All @@ -753,9 +753,9 @@ semantic_analysis_summary:
**User:** "Run rubric20 assessment on CM4AI D4D files (curated, gpt5, claudecode)"

**Agent:**
1. Evaluates each file separately
2. Generates detailed quality assessments
3. Highlights differences in FAIR compliance and technical documentation
1. Evaluates each file separately and generates detailed quality assessments, following the procedure in Example 1
2. Compare and contrast content and scoring between files
3. Report summary of comparison between files

## How This Agent Works

Expand Down
Loading