Skip to content

Name cleanup#148

Merged
realmarcin merged 10 commits into
mainfrom
name_cleanup
Apr 27, 2026
Merged

Name cleanup#148
realmarcin merged 10 commits into
mainfrom
name_cleanup

Conversation

@realmarcin
Copy link
Copy Markdown
Collaborator

No description provided.

realmarcin and others added 9 commits April 25, 2026 18:23
These four classes were added to the D4D schema after the original
semantic exchange layer was authored, leaving them without RO-Crate
mappings. This commit closes that gap.

Semantic SSSOM (src/data_sheets_schema/alignment/d4d_rocrate_sssom_mapping.tsv):
  +12 rows (95 → 107)
  - DatasetCollection → schema:Dataset (exactMatch, RO-Crate root)
  - DatasetCollection → dcat:Catalog (closeMatch, semantic-catalog view)
  - File → schema:MediaObject (exactMatch)
  - File → schema:DigitalDocument (closeMatch)
  - FileCollection → schema:Dataset (exactMatch, nested in hasPart)
  - FileCollection → dcat:Distribution (closeMatch)
  - 6 key-slot rows: DatasetCollection.resources/FileCollection.resources →
    schema:hasPart, File.file_type → d4d:fileType, FileCollection.{collection_type,
    file_count, total_bytes} → d4d:collectionType / d4d:fileCount / dcat:byteSize

Structural SSSOM (data/mappings/d4d_rocrate_structural_mapping.sssom.tsv):
  +6 rows (149 → 155) — slot-level rows mirroring the semantic-file slots

SKOS alignment (src/data_sheets_schema/alignment/d4d_rocrate_skos_alignment.ttl):
  - Added dcat: prefix declaration
  - Added 6 class-level + 6 slot-level skos triples mirroring the SSSOM rows

Per the user's note that DatasetCollection may be the RO-Crate root
(@type=["Dataset", "https://w3id.org/EVI#ROCrate"], @id="./"),
DatasetCollection is given a dual mapping: exactMatch → schema:Dataset
(root semantics) and closeMatch → dcat:Catalog (semantic-catalog view).

Out of scope for this PR (existing TODOs remain):
  - src/fairscape_integration/d4d_to_fairscape.py:292-295 — converter
    code does not yet traverse FileCollection.resources to emit RO-Crate
    File entities. The mapping layer is now ready; converter update is
    a separate follow-up.
  - The generated comprehensive/uri SSSOM variants weren't regenerated;
    the canonical files (semantic + structural) are the source of truth.

Validation:
  - SSSOMIntegration parses both files (semantic via custom reader,
    structural via sssom-py per the existing column-naming setup)
  - All 190 tests in tests/test_alignment + tests/test_fairscape_integration pass

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A reusable Claude Code slash command that captures the workflow used in
this PR — adding D4D ↔ RO-Crate / FAIRSCAPE mappings for new schema
classes. The skill:

- Describes the 19-column semantic SSSOM and 17-column structural SSSOM
  layouts and points at the canonical files
- Provides a decision rubric for choosing primary/secondary RO-Crate
  targets based on class_uri / exact_mappings / tree_root annotations
- Includes row templates and a Python helper-script skeleton
- Documents standard RO-Crate target conventions (root Dataset,
  schema:MediaObject, dcat:Catalog, schema:hasPart, etc.)
- Specifies the mandatory validation step via SSSOMIntegration + pytest
- Codifies branch / commit / PR conventions
- Calls out known follow-ups to keep out of scope (converter TODOs,
  generator regen, schema YAML touch-ups)

Cross-references PR #147 as the canonical worked example.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Generated from the D4D ↔ RO-Crate semantic SSSOM by parsing rocrate_json_path
patterns to extract entity types and their properties. Shows:
- Dataset (root) with properties grouped by namespace (schema.org, DCAT,
  FAIRSCAPE EVI, Croissant RAI, D4D-specific)
- Sub-entities: MediaObject, Person, Organization, Grant, CreativeWork,
  DefinedTerm
- Reference edges (author/creator/contributor → Person, funder → Grant,
  publisher → Organization, citation → CreativeWork, about → DefinedTerm,
  hasPart → MediaObject)
- ROCrate as root marker connected via dashed @type edge

Generator: src/alignment/ (helper script captured in /tmp during this PR);
rendered with graphviz dot -Gdpi=180.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per-class side-by-side comparison of slot counts in the d4d-core
semantic exchange layer (left, orange) versus mapped/standard
RO-Crate properties on the corresponding target type (right, green).

Right-side counts combine SSSOM-discovered properties with the
schema.org / RO-Crate 1.1 baseline for sub-entity types
(Person, Organization, Grant, MediaObject, Distribution).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… site coverage

- src/data_sheets_schema/alignment/ → src/data_sheets_schema/semantic_exchange/
  (canonical SKOS TTL + semantic SSSOM artifacts)
- data/mappings/ → data/semantic_exchange/
  (sssom-py-compatible structural mapping + analysis docs)
- src/alignment/ → src/semantic_exchange/  (generator scripts)
- tests/test_alignment/ → tests/test_semantic_exchange/

Updated all path references in Makefile, generator scripts, schema YAMLs,
fairscape_integration, notes, and tests. All 190 tests pass.

Visibility improvements:
- README.md: new "D4D-Core Schema" + "Semantic Exchange Layer" sections
  with per-artifact path tables
- docs/home.md: top-level pointers to D4D-Core and Semantic Exchange
- docs/d4d_core.md: new hand-curated landing page for the core schema
  (artifacts, build/validate targets, curated example datasheets, class
  crosswalk, rationale)
- docs/semantic_exchange.md: new hand-curated landing page for the
  exchange layer (canonical artifacts, generator scripts, validation,
  /d4d-add-mapping workflow, namespaces, coverage stats)
- mkdocs.yml: added "D4D-Core" and "Semantic Exchange" to nav

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previously the chart only covered 8 hand-listed structural classes.
Now it shows every d4d-core class, sorted by slot count, in a
two-column layout with poster-friendly aspect (~1.84).

Right-side counts:
- Structural targets (Dataset/Distribution/Person/Org/Grant/etc.):
  full property surface (SSSOM-discovered + schema.org baseline)
- Property/wrapper classes: derived by looking up which slots have
  the class as range, then checking the SKOS TTL for mapped targets

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- SSSOM subject_id values for the 6 new key-slot rows now use the
  underscore form (d4d:Class_slot) to match the SKOS TTL subjects
  and what generate_sssom_mapping.py emits, so downstream lookups
  via SSSOMIntegration.get_mappings_by_subject() resolve correctly.
- SSSOM header refreshed: '# Total mappings: 107' (was 95) and
  '# Date: 2026-04-26'.
- SKOS TTL header bumped to Version 1.1 / Date 2026-04-26 and the
  alignment-statistics block updated to reflect the current 112
  triples (69 exact / 25 close / 10 related / 7 narrow / 1 broad)
  and the per-namespace counts (schema.org 57, rai 29, d4d 10,
  evi 7, dcat 3, rdf 1).

Tests: 190 passed, 2 skipped.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The skill doc still pointed contributors at the pre-rename paths
(src/data_sheets_schema/alignment/, data/mappings/, src/alignment/,
tests/test_alignment/) so its grep, git-add, and validation snippets
no longer matched the canonical files. Repointed every reference to
the renamed directories.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The data/semantic_exchange/ directory had grown to seven SSSOM TSV
copies, several of which were stale snapshots or byte-identical
duplicates of the canonical files in
src/data_sheets_schema/semantic_exchange/. Two of them were a v1/v2
pair that were impossible to interpret without comparing dates by
hand (v1 was the newer 2026-04-09 / 284-attr file; v2 was a stale
2026-03-23 / 268-attr file).

Deleted from data/semantic_exchange/:
- d4d_rocrate_sssom_mapping.tsv          (stale 102-row snapshot)
- d4d_rocrate_sssom_mapping_subset.tsv   (duplicate of src/)
- d4d_rocrate_sssom_comprehensive.tsv    (duplicate of src/)
- d4d_rocrate_sssom_uri_mapping.tsv      (duplicate of src/)
- d4d_rocrate_sssom_uri_comprehensive_v1.tsv  (duplicate of src/'s
                                                canonical
                                                d4d_rocrate_sssom_uri_comprehensive.tsv)
- d4d_rocrate_sssom_uri_comprehensive_v2.tsv  (stale older snapshot)
- d4d_rocrate_sssom_uri_interface.tsv    (orphan; not referenced
                                          anywhere in code or Make)

Kept in data/semantic_exchange/ (canonical here):
- d4d_rocrate_structural_mapping.sssom.tsv
- d4d_rocrate_structural_mapping_summary.md
- STRUCTURAL_MAPPING_ANALYSIS.md
- uri_mapping_recommendations.md
- README.md (rewritten to point at src/.../semantic_exchange/ for
             everything except the structural mapping)

Updated tests/test_semantic_exchange/test_sssom_validation.py to
look up comprehensive / uri / uri_comprehensive in the canonical
src/ tree instead of the deleted data/ copies. Tests: 190 passed,
2 skipped.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 26, 2026 19:38
Conflict resolution: kept the name_cleanup intent — canonical SSSOMs
in src/data_sheets_schema/semantic_exchange/ only; data/semantic_exchange/
keeps just the structural mapping (sssom-py compatible) and analysis docs.

- Confirmed deletion of duplicate / stale TSVs from data/semantic_exchange/
  that main had recreated as part of its rename of data/mappings/ →
  data/semantic_exchange/
- Kept HEAD's lean README.md (points at canonical src/.../semantic_exchange/
  for everything except the structural mapping) over main's older
  "D4D Mapping Files" version that referenced v1/v2/uri_interface
- Resolved test_sssom_validation.py to use src_dir for comprehensive,
  uri, and uri_comprehensive lookups

src/.../semantic_exchange/*.tsv files are byte-identical on both sides;
accepted ours. Tests: 190 passed, 2 skipped.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR consolidates the D4D ↔ RO-Crate/FAIRSCAPE “semantic exchange” artifacts under semantic_exchange/ (replacing older alignment/ + data/mappings/ locations), updates generator defaults and downstream consumers, and adds user-facing documentation for D4D-Core + the exchange layer.

Changes:

  • Move/standardize paths for SKOS + SSSOM artifacts to src/data_sheets_schema/semantic_exchange/ and data/semantic_exchange/, updating tests, CLI defaults, docs, and Make targets accordingly.
  • Add new documentation pages (d4d_core.md, semantic_exchange.md) and update MkDocs navigation + repo README.
  • Introduce helper tooling/docs for maintaining mappings (e.g., add_slot_uris.py, /d4d-add-mapping command doc).

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated no comments.

Show a summary per file
File Description
tests/test_semantic_exchange/test_sssom_validation.py Update tests to new semantic_exchange artifact locations and include both data/ + src/ TSV sources.
tests/test_semantic_exchange/init.py Add package docstring for semantic exchange tests.
tests/test_fairscape_integration/test_sssom_reader.py Update structural mapping path to data/semantic_exchange/....
tests/test_fairscape_integration/test_sssom_integration.py Update structural mapping path to data/semantic_exchange/....
src/semantic_exchange/implement_uri_mappings.py Update docstring paths/usages to semantic_exchange locations.
src/semantic_exchange/generate_structural_mapping.py Change default output directory to data/semantic_exchange.
src/semantic_exchange/generate_sssom_uri_mapping.py Update default SKOS/output paths to semantic_exchange directory.
src/semantic_exchange/generate_sssom_mapping.py Update default SKOS/output paths to semantic_exchange directory.
src/semantic_exchange/generate_comprehensive_sssom_uri.py Update default SKOS/output paths to semantic_exchange directory.
src/semantic_exchange/generate_comprehensive_sssom.py Update default SKOS/output paths to semantic_exchange directory.
src/semantic_exchange/add_slot_uris.py Add a new helper script to apply slot_uri recommendations to schema YAMLs.
src/semantic_exchange/add_module_column.py Update mappings output directory to data/semantic_exchange.
src/fairscape_integration/fairscape_to_d4d.py Update default semantic SSSOM mapping path to semantic_exchange directory.
src/fairscape_integration/README_STANDARD_TOOLING.md Update examples to semantic_exchange paths (one path still incorrect; see comments).
src/data_sheets_schema/semantic_exchange/d4d_rocrate_sssom_uri_mapping.tsv Add URI-level SSSOM mapping file under canonical semantic_exchange path.
src/data_sheets_schema/semantic_exchange/d4d_rocrate_sssom_mapping_subset.tsv Add subset semantic SSSOM TSV under canonical semantic_exchange path.
src/data_sheets_schema/semantic_exchange/d4d_rocrate_skos_alignment.ttl Update alignment TTL (prefixes, version/date, added class/slot triples, updated stats).
src/data_sheets_schema/schema/data_sheets_schema_core_all.yaml Update see_also reference to new SKOS TTL location.
src/data_sheets_schema/schema/data_sheets_schema_core.yaml Update see_also reference to new SKOS TTL location.
src/data_sheets_schema/schema/D4D_Core.yaml Update see_also reference to new SKOS TTL location.
src/data_sheets_schema/alignment/d4d_rocrate_sssom_uri_mapping.tsv Remove old URI SSSOM mapping from deprecated alignment directory.
src/data_sheets_schema/alignment/d4d_rocrate_sssom_mapping_subset.tsv Remove old subset SSSOM mapping from deprecated alignment directory.
src/data_sheets_schema/alignment/d4d_rocrate_sssom_mapping.tsv Remove old semantic SSSOM mapping from deprecated alignment directory.
notes/SEMANTIC_EXCHANGE_IMPLEMENTATION.md Update references to the new SKOS TTL path.
mkdocs.yml Add nav entries for D4D-Core and Semantic Exchange docs pages.
docs/semantic_exchange.md Add user-facing documentation for artifacts, generators, validation, and workflow.
docs/home.md Add entry points/links for D4D-Core and Semantic Exchange docs.
docs/d4d_core.md Add user-facing documentation for the D4D-Core schema subset.
data/semantic_exchange/uri_mapping_recommendations.md Add URI mapping recommendation document under new directory.
data/semantic_exchange/d4d_rocrate_structural_mapping_summary.md Add structural mapping summary under new directory.
data/semantic_exchange/d4d_rocrate_structural_mapping.sssom.tsv Add/update structural mapping with additional rows for newly mapped classes/slots.
data/semantic_exchange/STRUCTURAL_MAPPING_ANALYSIS.md Update structural mapping analysis to new script path + output locations.
data/semantic_exchange/README.md Add README describing what belongs in data/semantic_exchange/.
data/poster_assets/figures/fig7_rocrate_profile.dot Add DOT diagram source for RO-Crate profile figure.
data/mappings/d4d_rocrate_sssom_uri_interface.tsv Remove old interface URI mapping file from deprecated directory.
data/mappings/README.md Remove old mappings README from deprecated directory.
README.md Document D4D-Core and Semantic Exchange as first-class entry points and update repo structure overview.
Makefile Repoint SSSOM/SKOS generator variables and outputs to semantic_exchange paths.
.claude/commands/d4d-add-mapping.md Add a new “add mapping” command doc describing the workflow to extend exchange layer mappings.
.claude/commands/README.md Register /d4d-add-mapping in the commands index.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@realmarcin realmarcin merged commit ecc8012 into main Apr 27, 2026
3 checks passed
realmarcin added a commit that referenced this pull request Apr 29, 2026
Conflict resolution:
- Canonical SSSOM/SKOS files at src/data_sheets_schema/semantic_exchange/:
  kept ours (114-row mapping, plus 7 d4d-core class additions on top of
  the 107-row baseline that PR #147 already shipped, plus expanded SKOS
  TTL).
- Mapping TSVs duplicated under data/semantic_exchange/: deleted.
  PR #148 (Name cleanup) already moved them to the canonical
  src/data_sheets_schema/semantic_exchange/ location.
- Poster figures added by main (fig7_rocrate_profile.{dot,png},
  fig8_exchange_butterfly.png): removed per project rule that poster
  artifacts don't get committed here.
- README + test_sssom_validation.py: took main's version (correctly
  reflects the post-#148 structural/canonical split).
- docs/html_output/concatenated/curated/*.html re-rendered from current
  renderer + curated YAMLs (generated, not hand-merged).
- data/semantic_exchange/d4d_rocrate_structural_mapping.sssom.tsv:
  kept ours (superset of main).

Tests: tests.test_semantic_exchange.test_sssom_validation passes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants