Skip to content

Add NX docstring as attribute#415

Open
lukaspie wants to merge 5 commits into
masterfrom
403-improvement-add-nx-doc-string-as-attribute
Open

Add NX docstring as attribute#415
lukaspie wants to merge 5 commits into
masterfrom
403-improvement-add-nx-doc-string-as-attribute

Conversation

@lukaspie

@lukaspie lukaspie commented Aug 29, 2024

Copy link
Copy Markdown
Collaborator

@rettigl this is a way to add the NX docstrings to the HDF5 files as attributes. The idea is that you can pass the write-docs flag to the dataconverter and then docstrings are added. By default, this is turned off so as to not change any existing workflows. We can discuss if we want this to happen by default.

Here's an example using the xps reader:
output.nxs.zip

The implementation is relatively trivial. There are however two open questions for me

  • How to handle inherited docs
    Usually, our appdefs have very sparse documentation since all the docstrings are in the base classes or in appdefs that are further up the chain. My understanding is that the docs are also extended (not overwritten) when you inherit from a class. So technically, one should concatenate the docs of all inherited nodes. However, this gets quite unwieldy and messy as the docs in the files get rather large and sometimes they might even contradict each other. My suggestion (implemented here) was that as we travel up the inherited nodes, as soon as there is a docstring we only use that one (and don't add any docs coming from even earlier nodes). But this is probably not strictly correct. Maybe @sanbrock can comment.

  • Docs for NeXus attributes
    NeXus attributes are written as HDF5 attributes already. Since HDF5 attributes cannot have attributes themselves, the question is where to place the docs for these attributes? My solution here: write another attribute <attribute>__docs (e.g. entry/definition/version__docs) to the HDF5 file.

@lukaspie lukaspie linked an issue Aug 29, 2024 that may be closed by this pull request
Comment thread .gitmodules Outdated
[submodule "src/pynxtools/definitions"]
path = src/pynxtools/definitions
url = https://github.com/FAIRmat-NFDI/nexus_definitions.git
branch = nxmpes-unit-doc-test

@lukaspie lukaspie Aug 30, 2024

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reminder to remove before merging

@lukaspie lukaspie marked this pull request as ready for review August 30, 2024 10:49
@sherjeelshabih

Copy link
Copy Markdown
Collaborator
  • Docs for NeXus attributes
    NeXus attributes are written as HDF5 attributes already. Since HDF5 attributes cannot have attributes themselves, the question is where to place the docs for these attributes? My solution here: write another attribute <attribute>__docs (e.g. entry/definition/version__docs) to the HDF5 file.

Any practical solution works in my opinion. The only issue will be how to report/add this in the NXDL structure. It's a bit meta over the already meta attributes we have in the NXDL. Will something like adding this renameable field, FIELDNAME__docs, in NXobject work out nicely in the NXDL framework?

@lukaspie

Copy link
Copy Markdown
Collaborator Author
  • Docs for NeXus attributes
    NeXus attributes are written as HDF5 attributes already. Since HDF5 attributes cannot have attributes themselves, the question is where to place the docs for these attributes? My solution here: write another attribute <attribute>__docs (e.g. entry/definition/version__docs) to the HDF5 file.

Any practical solution works in my opinion. The only issue will be how to report/add this in the NXDL structure. It's a bit meta over the already meta attributes we have in the NXDL. Will something like adding this renameable field, FIELDNAME__docs, in NXobject work out nicely in the NXDL framework?

I didn't go this far. For groups and fields, I basically added an attribute @docs, whereas for the attributes, I used an additional attribute <attribute>__docs. I guess the problem would be that all of these are undocumented..

@sherjeelshabih

Copy link
Copy Markdown
Collaborator

Ah alright. So it's just for the attributes that you add a suffix.

You're right. They will remain undocumented. Let's say to see how it goes in use for us we can leave it undocumented.

It will make it practically easier to understand Nexus files like this. It makes the Nexus files more self sufficient too. And it seems this is the best we can do without overcomplicating it.

@lukaspie

lukaspie commented Sep 3, 2024

Copy link
Copy Markdown
Collaborator Author

Notes from TF meeting:

  • Use single underscore for attributes, like for example data and data_units
  • Can we render the docs a bit nicer? HTML syntax?
  • Don't write the documentation to every single instance of the same concept, use internal / softlink
  • External links to documentation? docs and docs_url
  • @sanbrock: HDF5 contains data, not information. What happens when we want to reuse the data somewhere else? We could prevent this by writing a parallel structure with the docs and linking to that place.
    • option 1: add a warning if links are used and the docstrings are not actually correct?
    • option 2 (by @rettigl): if we write the docs, we are not using links, but rather we copy the data directly

@lukaspie lukaspie force-pushed the 403-improvement-add-nx-doc-string-as-attribute branch from 7369779 to 81edf90 Compare September 17, 2024 15:34
Comment thread src/pynxtools/dataconverter/helpers.py Outdated
lukaspie pushed a commit that referenced this pull request May 19, 2026
…F5 attrs

Rebased from PR #415 onto nexus-inheritance-concept-paths, rewriting the
doc extraction to use NexusNode instead of get_inherited_nodes(). Key changes:

- writer.py: new __nxdl_docs() using get_inheritance_concept_paths(); writes
  @docs on groups/fields, @<attr>_docs on attributes (single underscore per
  TF decision); _format_doc() helper for optional RST rendering via docutils
- convert.py: extract write_docs/docs_format from kwargs before Writer.write()
- cli.py: --write-docs flag and --docs-format choice option with guard
- NXtest.nxdl.xml: add <doc> to version attribute for test coverage
- pyproject.toml: add docutils as hard dependency
- test_writer.py: test_write_docs() covering appdef root, field, attr, group docs
@lukaspie lukaspie force-pushed the 403-improvement-add-nx-doc-string-as-attribute branch from 55ea46a to fc747e2 Compare May 19, 2026 22:32
lukaspie pushed a commit that referenced this pull request May 19, 2026
…F5 attrs

Rebased from PR #415 onto nexus-inheritance-concept-paths, rewriting the
doc extraction to use NexusNode instead of get_inherited_nodes(). Key changes:

- writer.py: new __nxdl_docs() using get_inheritance_concept_paths(); writes
  @docs on groups/fields, @<attr>_docs on attributes (single underscore per
  TF decision); _format_doc() helper for optional RST rendering via docutils
- convert.py: extract write_docs/docs_format from kwargs before Writer.write()
- cli.py: --write-docs flag and --docs-format choice option with guard
- NXtest.nxdl.xml: add <doc> to version attribute for test coverage
- pyproject.toml: add docutils as hard dependency
- test_writer.py: test_write_docs() covering appdef root, field, attr, group docs
@lukaspie lukaspie force-pushed the 403-improvement-add-nx-doc-string-as-attribute branch from fc747e2 to 2b1cfe6 Compare May 19, 2026 22:39
@lukaspie

lukaspie commented May 19, 2026

Copy link
Copy Markdown
Collaborator Author

@mkuehbach @rettigl I made some changes here, following the changes to the NexusNode object. In principle, I believe this would be ready. The bigger question is if we want to have such a feature. I think since it is optional and only turned on by passing a flag to pynx convert, it is relatively harmless and a nice feature to add. But happy to hear your opinions.

What changed (since rebase onto latest master)

  • writer.py: --write-docs embeds NeXus concept docstrings as HDF5 string attributes (@docs, @docs_url, @<attr>_docs, @<attr>_docs_url) using NexusNode.get_link() for the URL and NexusNode.get_inheritance_concept_paths() for inheritance-aware doc text.
  • cli.py / convert.py: --write-docs flag and --docs-format option (default = raw RST, plain = markup-stripped single-line, optimised for h5web).
  • annotator.py: pynx read silently skips @docs/@docs_url/@*_docs/@*_docs_url attributes to keep output uncluttered.
  • NXtest.nxdl.xml: added <doc> to the version attribute for test coverage.
  • Tests: test_write_docs parametrized over both formats; plain asserts no embedded \n.
  • Docs: new section in dataconverter-and-readers.md with usage, format table, and a !!! warning on link semantics (write-time and read-time). -> this addresses the tension raised by Sandor in Add NX docstring as attribute #415 (comment).

Not yet addressed / open design questions

  1. Repeated concepts — every instance of the same concept gets its own copy of the docs string. The suggestion was to write docs once and softlink; not implemented (architecturally non-trivial, requires a parallel /.nexus_docs/ structure or concept-level deduplication).

  2. Link/copy tension--write-docs intentionally skips HDF5 link nodes (soft link, virtual dataset, external link) to avoid attaching the wrong concept's docs. The alternative (@rettigl: copy data instead of linking when --write-docs is active) was discussed but not implemented. Currently only documented with a warning.

  3. HTML rendering — dropped: NXDL docs use Sphinx RST (:ref:, .. include::) that docutils cannot resolve standalone. Only default and plain are offered. plain is however good enough that in an h5web viewer one can read the docs sufficiently well:

image

@lukaspie lukaspie requested review from mkuehbach and rettigl and removed request for mkuehbach and rettigl May 19, 2026 23:42
lukaspie pushed a commit that referenced this pull request May 20, 2026
…F5 attrs

Rebased from PR #415 onto nexus-inheritance-concept-paths, rewriting the
doc extraction to use NexusNode instead of get_inherited_nodes(). Key changes:

- writer.py: new __nxdl_docs() using get_inheritance_concept_paths(); writes
  @docs on groups/fields, @<attr>_docs on attributes (single underscore per
  TF decision); _format_doc() helper for optional RST rendering via docutils
- convert.py: extract write_docs/docs_format from kwargs before Writer.write()
- cli.py: --write-docs flag and --docs-format choice option with guard
- NXtest.nxdl.xml: add <doc> to version attribute for test coverage
- pyproject.toml: add docutils as hard dependency
- test_writer.py: test_write_docs() covering appdef root, field, attr, group docs
@lukaspie lukaspie force-pushed the 403-improvement-add-nx-doc-string-as-attribute branch from bb7d425 to a3097dd Compare May 20, 2026 10:05
lukaspie pushed a commit that referenced this pull request May 21, 2026
…F5 attrs

Rebased from PR #415 onto nexus-inheritance-concept-paths, rewriting the
doc extraction to use NexusNode instead of get_inherited_nodes(). Key changes:

- writer.py: new __nxdl_docs() using get_inheritance_concept_paths(); writes
  @docs on groups/fields, @<attr>_docs on attributes (single underscore per
  TF decision); _format_doc() helper for optional RST rendering via docutils
- convert.py: extract write_docs/docs_format from kwargs before Writer.write()
- cli.py: --write-docs flag and --docs-format choice option with guard
- NXtest.nxdl.xml: add <doc> to version attribute for test coverage
- pyproject.toml: add docutils as hard dependency
- test_writer.py: test_write_docs() covering appdef root, field, attr, group docs
@lukaspie lukaspie force-pushed the 403-improvement-add-nx-doc-string-as-attribute branch from a3097dd to 61ff989 Compare May 21, 2026 15:49
lukaspie pushed a commit that referenced this pull request May 28, 2026
…F5 attrs

Rebased from PR #415 onto nexus-inheritance-concept-paths, rewriting the
doc extraction to use NexusNode instead of get_inherited_nodes(). Key changes:

- writer.py: new __nxdl_docs() using get_inheritance_concept_paths(); writes
  @docs on groups/fields, @<attr>_docs on attributes (single underscore per
  TF decision); _format_doc() helper for optional RST rendering via docutils
- convert.py: extract write_docs/docs_format from kwargs before Writer.write()
- cli.py: --write-docs flag and --docs-format choice option with guard
- NXtest.nxdl.xml: add <doc> to version attribute for test coverage
- pyproject.toml: add docutils as hard dependency
- test_writer.py: test_write_docs() covering appdef root, field, attr, group docs
@lukaspie lukaspie force-pushed the 403-improvement-add-nx-doc-string-as-attribute branch from 61ff989 to c86328f Compare May 28, 2026 18:19
lukaspie added 3 commits May 29, 2026 10:16
…F5 attrs

Rebased from PR #415 onto nexus-inheritance-concept-paths, rewriting the
doc extraction to use NexusNode instead of get_inherited_nodes(). Key changes:

- writer.py: new __nxdl_docs() using get_inheritance_concept_paths(); writes
  @docs on groups/fields, @<attr>_docs on attributes (single underscore per
  TF decision); _format_doc() helper for optional RST rendering via docutils
- convert.py: extract write_docs/docs_format from kwargs before Writer.write()
- cli.py: --write-docs flag and --docs-format choice option with guard
- NXtest.nxdl.xml: add <doc> to version attribute for test coverage
- pyproject.toml: add docutils as hard dependency
- test_writer.py: test_write_docs() covering appdef root, field, attr, group docs
find_node_at_path was removed from NexusNode; use the resolve_path()
function from nexus.schema_resolver and tree.get_inheritance_concept_paths()
for the appdef root doc special case instead of get_inheritance_docs().
h5web renders newlines as literal \n in attribute values. For 'plain'
format, reflow paragraphs to a single string (intra-paragraph line breaks
become spaces, multi-paragraph docs concatenated with space). Use ': '
between concept label and doc text, ' | ' between inheritance levels.
lukaspie added 2 commits May 29, 2026 10:16
…write-docs

- test_writer.py: parametrize test_write_docs over default/plain; plain
  variant additionally asserts no embedded newlines in any docs attribute
- dataconverter-and-readers.md: add section explaining --write-docs /
  --docs-format, attribute naming convention, and format comparison table
…ocument link behaviour

- writer.py: __nxdl_docs() returns (doc_text, docs_url) tuple; URL from
  node.get_link() pointing to the online NeXus manual entry; writes @docs_url
  on groups/fields and @<attr>_docs_url on attribute datasets
- annotator.py: _annotate_attribute() skips @docs, @docs_url, @*_docs and
  @*_docs_url so pynx read output stays uncluttered
- dataconverter-and-readers.md: expand docs_url description; expand warning
  admonition to cover both write-time (links skip docs) and read-time
  (links to doc'd files expose source-concept docs at destination)
@lukaspie lukaspie force-pushed the 403-improvement-add-nx-doc-string-as-attribute branch from c86328f to 802d214 Compare May 29, 2026 08:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Improvement] Add NX doc string as attribute

3 participants