Static dereferenceable linked-data artifacts for the Semantic Arts gist ontology.
This repository generates and hosts one small RDF fragment per named gist: term so authoritative IRIs such as https://w3id.org/semanticarts/ns/ontology/gist/Account can resolve to useful machine-readable descriptions, or human-readable descriptions when published alongside the generated content. The content in docs/ can be hosted on any static HTTP/S server. Content can be made available at gist's authoritative IRIs through w3id.org redirects to that server with basic content negotiation as proposed below.
No W3C Recommendation defines what triples a server should return when an ontology term IRI is dereferenced. There are no widely-adopted common practices.
The relevant W3C documents are:
- Best Practice Recipes for Publishing RDF Vocabularies (2008, Working Group Note) — covers HTTP-level mechanics: hash vs slash namespaces, 303 redirects, content negotiation, Apache configuration. It does not specify the RDF payload for per-term responses.
- Cool URIs for the Semantic Web (2008,
Interest Group Note) — covers 303 and hash IRI patterns and the
httpRange-14resolution. Also silent on payload composition. - CBD - Concise Bounded Description (2005, Member Submission) — defines CBD and SCBD. Submissions are not endorsed by W3C and have no normative standing.
RDF 1.1 Concepts (a Recommendation) is explicit that the RDF specs do not address dereferencing behavior:
Perhaps the most important characteristic of IRIs in web architecture is that they can be dereferenced, and hence serve as starting points for interactions with a remote server. This specification is not concerned with such interactions. It does not define an interaction model.
The Concise Bounded Description (CBD) is the subgraph of all triples whose subject is the given term, plus recursively the CBDs of any blank nodes reached as objects. This provides a self-contained description of a node's outgoing properties.
The Symmetric CBD (SCBD) extends the CBD by also including all triples where the term appears as the object, pulling in incoming edges too. The result is a fully symmetric neighborhood of the node in both directions.
Because gist makes extensive use of axioms, a lot of the semantics of a given term are defined via incoming edges by axioms that are grouped not with the given term but with other, closely related terms. We use SCBD to provide the consumer with all axioms that directly pertain to the meaning of the referenced term.
- Hosted Artifacts
docs/terms/: generated static term files. The current checkout contains 216 terms in three serializations each:- Turtle:
*.ttl - RDF/XML:
*.rdf - JSON-LD:
*.jsonld
- Turtle:
docs/ontologies/: manually-added, WIDOCO-generated files for human-readable presentation.docs/demo.html: demonstration page for returning results independently of redirects from authoritative IRIs. Not served as the site root, so the namespace IRI's HTML branch can route elsewhere.
- Redirection Rules
tools/semanticarts.htaccess: proposedw3id.orgApache rewrite rules for the gist namespace and ontology document IRIs.
- Artifact-Generation Code & Tests
scbd_no_orphans.py: core extraction function — implements the SCBD variant described below (SCBD with orphan blank-node fragments filtered out).relabel.py: deterministic blank-node relabeling — replaces rdflib's parse-time bnode IDs with hashes of canonical anchor paths so re-runs don't churn IDs.canonicalize.py: post-processing for the JSON-LD and RDF/XML serializer output — sorts dicts, arrays, and XML elements so element ordering is stable across runs (preserving@listorder, which is semantically significant).build.py: CLI that loads one or more source ontology Turtle files and writes per-term fragments to an output directory.tests/: pytest suites covering the extraction logic (test_scbd_no_orphans.py, 6 cases), the relabeling (test_relabel.py, 7 cases — including a round-trip isomorphism check for every serialization), and the canonicalization (test_canonicalize.py, 8 cases — semantics preservation, idempotence, byte stability, and@listorder preservation).
The generated files in this repository currently target gist 14.1.0.
build.py loads the source ontology modules into a single merged graph, then for
each named term in the target namespace writes a per-term fragment to
docs/terms/{LocalName}.{ttl,rdf,jsonld}.
Each fragment is the Symmetric Concise Bounded Description (SCBD) of the term, as defined in the W3C Member Submission CBD - Concise Bounded Description, with one minor adjustment: orphan blank-node fragments are filtered out.
The extraction proceeds in two phases:
- Phase 1 — outgoing CBD: all triples reachable from the term via blank-node chains (restrictions, list cells, class expressions, etc.) are included.
- Phase 2 — back-references: for each triple
(s, p, term)wheresis a blank node, the algorithm walks backward through blank-node chains looking for a named (IRI) ancestor. If one is found, the full chain and its CBD expansion are included. If no named ancestor exists, the fragment is dropped.
That drop step is the only departure from spec-compliant SCBD, and it's a minor
one: the dropped fragments are blank-node subgraphs that have no path to any
named IRI in the source graph (for example, stray one-element rdf:List cells
that survive serialization but no longer belong to a containing class expression).
Including them in a per-term fragment is noise — the consumer cannot interpret a
list cell without the class expression that owns it.
This still captures every genuinely useful back-reference, such as owl:unionOf
or owl:intersectionOf expressions on other named classes that reference the
term.
rdflib assigns fresh blank-node identifiers on every parse, and its RDF/XML
and JSON-LD serializers emit elements in set-iteration order. Both effects
would otherwise produce noisy diffs on every rebuild. build.py neutralizes
them in two passes after extraction:
relabel.pyreplaces each blank node with one whose identifier ishash(canonical-path-from-named-ancestor). Structurally identical bnodes collapse to a single label, which is sound under RDF simple entailment.canonicalize.pypost-processes the serializer output. JSON-LD dicts and arrays are sorted recursively, except for arrays under@list(which encode rdf:List and are semantically ordered). RDF/XML elements are sorted by(tag, attributes, text, subtree-signature); CR entities ( ) are swapped through a Unicode sentinel across the parse/serialize cycle so CRLF line endings inside multi-line literals survive XML 1.0's text normalization.
Result: rerunning build.py on an unchanged source ontology produces
byte-identical .ttl, .rdf, and .jsonld files. The round-trip test in
test_relabel.py verifies that every serialization parses back to a graph
isomorphic to the original.
- Python 3.11 or newer
rdflib
python -m pip install rdflibFor running the test suite:
python -m pip install pytest
python -m pytest tests/Place the gist web download bundle inside the repository root (it is gitignored):
gist14.1.0_webDownload/
ontologies/
turtle/
gistCore14.1.0.ttl
gistRdfsAnnotations14.1.0.ttl
gistSubClassAssertions14.1.0.ttl
Then run from the repository root:
python build.py \
gist14.1.0_webDownload/ontologies/turtle/gistCore14.1.0.ttl \
gist14.1.0_webDownload/ontologies/turtle/gistRdfsAnnotations14.1.0.ttl \
gist14.1.0_webDownload/ontologies/turtle/gistSubClassAssertions14.1.0.ttl \
docs/terms \
--namespace https://w3id.org/semanticarts/ns/ontology/gist/The output directory will contain:
docs/
terms/
Account.ttl
Account.rdf
Account.jsonld
...
The generated docs/ directory is suitable for any static web host. The rewrite rules in
tools/semanticarts.htaccess are for w3id.org; they redirect term and ontology-document IRIs to the corresponding files hosted from docs/, and redirect the namespace IRI itself to the Semantic Arts landing page (see the routing table below).
Before deploying those rules, update the base URL in
tools/semanticarts.htaccess if the published site is not:
https://semanticarts.github.io/gist-deref
The current rewrite rules cover:
https://w3id.org/semanticarts/ns/ontology/gist/https://w3id.org/semanticarts/ns/ontology/gist/{Term}https://w3id.org/semanticarts/ontology/{OntologyDocument}
The namespace IRI itself redirects to the existing Semantic Arts landing page
at https://ontologies.semanticarts.com/ontology/Namespace.html, regardless
of Accept header.
For term IRIs, content negotiation redirects to the published /terms/ path
backed by docs/terms/ in this repository:
text/turtle->/terms/{Term}.ttlapplication/rdf+xml->/terms/{Term}.rdfapplication/ld+jsonorapplication/json->/terms/{Term}.jsonldtext/html-> the term anchor in the WIDOCO HTML documentation- default clients -> Turtle
The .htaccess file also contains routes for full-ontology documents and WIDOCO
HTML documentation. The current repository snapshot includes generated per-term
files under docs/terms; add or publish the matching ontology/ and html/
assets before relying on those routes in production. Those assets are not
produced by build.py — drop the source ontology Turtle/RDF-XML/JSON-LD files
into docs/ontologies/ and the WIDOCO output into docs/ontologies/ (or
docs/html/) by hand.
The local docs/ontologies/gist-widoco.html file has been patched so fragment
URLs with bare gist local names, such as #Address, scroll to the WIDOCO entity
whose HTML id is the full gist IRI. Preserve or reapply that hash-navigation
change if the WIDOCO page is regenerated.
After publication and w3id.org configuration, clients can request a specific
serialization of a term:
curl -L -H "Accept: text/turtle" https://w3id.org/semanticarts/ns/ontology/gist/Account
curl -L -H "Accept: application/ld+json" https://w3id.org/semanticarts/ns/ontology/gist/AccountThe same generated files can also be inspected directly in docs/terms/.
- Generated files are committed so that they can be deployed as static assets from the git repository.
- Re-run
build.pywhen updating to a new gist version or when the extraction logic changes. - Output is byte-stable across rebuilds; see the Deterministic output section above for how blank-node IDs and serializer ordering are normalized.